I’m training a Robot that takes in an angular velocity and a linear velocity to move to a given target position, adding a 3Draycast sensor to avoid obstacles.
the environment seemed extremely straightforward to me but the agent still doesn’t solve the env even after running it on two different configurations (one for 10 hours, which yielded a weird behavior of always going back, and another one for 7 hours that yielded an even weirder behavior of agent going a tiny bit forward then backward xD)
this is sadly my third environment that I try and doesn’t converge , I think there is a critical thing that I’m missing , if anyone can guide me here as I’m starting to lose hope
Rewards :
time penalty -0.00025
distance reward : -(distance/10) "to scale it lower than 1"
oncollisionstay : when colliding with an obstacle -1 for each frame that it collides
Ontriggerentr: +100 if collided with target
configuration :
behaviors:
MobileRobot:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units:
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 5000000
summary_freq: 5000
I randomize the agent rotation and position each episode, along with the target and other obstacles, such that they don’t overlap.
the observations are :
sensor.AddObservation(transform.position);
sensor.AddObservation(transform.rotation.eulerAngles);
sensor.AddObservation(target.position);
sensor.AddObservation(maxAngularSpeed);
sensor.AddObservation(maxLinearSpeed);
after 6 more hours of training and testing some environments tweaks, the robot still doesn’t converge, when I tried to test the brain on the agent in simulation (since I trained it in cmd with --no graphics option) it acts completely as if it doesn’t know anything about the environment, it collides with boundaries and walls, not even trying to get close of the target .
50 M steps update.
the robot moves very confidently now (meaning it doesn’t go back and forward as it used to) , however, there are 3 very weird behaviors
- it never seeks the goal even if it is just right in front of it
- instead it either goes straight to the boundaries to terminate the episode if it close to it
- or if just keeps “near” an obstacle and keeps hovering there
this is undescribable by my rewards functions at!
- I give it a linearly decreasing negative reward when it gets closer to target , and a flat negative reward when it hit obstacles , and a flat negative terminating reward when it hits boundaries of map*