My agent is not learning, Mean Reward is not increasing

Hi everyone,

I hope you are having a fantastic day!

I am starting to be desperate related to the training of my agent. I have a simple agent that has ray sensors 18 for the detection of ground, and 15 for the detection of the target.
The agent can move forward, backward and rotate left and right.
When an episode starts I randomly generate the map - it is a simple grid where the local position zero is the agent spawn point and on random position is the spawn point of the target. I am trying to learn the agent to reach the target.
The rewards are:
+1 for hitting the target
-1 for falling off the floor
-1 when reaching Max Step (1000)

My config file:

behaviors:
  AgentBeh_v2:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048
      buffer_size: 32768
      learning_rate: 0.003
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
      beta_schedule: constant
      epsilon_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 3
      memory:
        use_recurrent: true
        memory_size: 128
        sequence_length: 128
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 0.95
      curiosity:
        strength: 0.05
        gamma: 0.99
    max_steps: 5000000
    time_horizon: 2048
    summary_freq: 10000

The agent is unable to learn even after a few million steps. The Mean Reward is not improving from -1. The agent often gets stuck at the corner of the floor, not far from the respawn. There has to be something fundamentally wrong with my config file.

Any help, any ideas are very much appreciated!

Best regards,
jl



Hi everyone,

I managed to resolve the issue. Just for others who might be facing similar issues here is what helped me:

I was adding a negative reward once the agent fell from “white ground” and hit “lava” below. However, from the point the agent fell until it hit the lava, there was like a second during which the agent was falling, while still able to “move”.
I suspect that the agent failed to connect the negative reward with the act of falling from the white ground.
I adjusted the scene in a way that the agent gets a negative reward immediately once it lefts the white ground. From this point, the agent was able to learn rather quickly. After 200k steps it was able to solve even slightly more complicated mazes.

2 Likes