Hi everyone,
I hope you are having a fantastic day!
I am starting to be desperate related to the training of my agent. I have a simple agent that has ray sensors 18 for the detection of ground, and 15 for the detection of the target.
The agent can move forward, backward and rotate left and right.
When an episode starts I randomly generate the map - it is a simple grid where the local position zero is the agent spawn point and on random position is the spawn point of the target. I am trying to learn the agent to reach the target.
The rewards are:
+1 for hitting the target
-1 for falling off the floor
-1 when reaching Max Step (1000)
My config file:
behaviors:
AgentBeh_v2:
trainer_type: ppo
hyperparameters:
batch_size: 2048
buffer_size: 32768
learning_rate: 0.003
beta: 5.0e-4
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
beta_schedule: constant
epsilon_schedule: linear
network_settings:
normalize: true
hidden_units: 256
num_layers: 3
memory:
use_recurrent: true
memory_size: 128
sequence_length: 128
reward_signals:
extrinsic:
gamma: 0.99
strength: 0.95
curiosity:
strength: 0.05
gamma: 0.99
max_steps: 5000000
time_horizon: 2048
summary_freq: 10000
The agent is unable to learn even after a few million steps. The Mean Reward is not improving from -1. The agent often gets stuck at the corner of the floor, not far from the respawn. There has to be something fundamentally wrong with my config file.
Any help, any ideas are very much appreciated!
Best regards,
jl