Best Practice for Ability with "Cooldown" in ML Agents

Hey everyone,

I had this initial setup where I have a spinning laser, and my agent needs to go around the room collecting rewards without being hit by the laser. Picked up rewards give plus points, and getting hit by the laser punishes the agent -1 and ends the round. This setup worked quite well, after training the agent was able to navigate around.

I wanted to take it a step forward, where there are now 2 lasers that rotate in opposite directions, so eventually the agent MUST jump over them to keep playing. Here is how jumping works:

  • The agent has a discrete action with 2 potential values 0 or 1. If it is set to 1 and canJump is true, the agent jumps.
  • Once the agent jumps, canJump becomes false. It has a 2 second cooldown that only ticks down when the agent is grounded.

I often see the agent just jump at the wrong time, and then dying in-between the two lasers because the jumping is on cooldown. Sometimes it jumps at the right time too…

Most observations of the agent are done via ray sensors (the walls, the lasers, the rewards), but in order to further inform the agent about the jumping I added observations as such:

  • Agent is passed isGrounded boolean.
  • Agent is passed float jumpTimer / 2f, so this would be 0 when the agent jumps and slowly creeps up to 1 when the agent is ready to jump again.

Note: Every observation (ray sensors + manual observations) are 2 stacked. So the agent uses this and previous frame’s observations to make decisions.

What do you guys think about this approach? Why or why not is the agent able to get a grasp of this jumping mechanic? Do I just need to train for longer?

I feel like there is something wrong about the way I convey the information about jumping to the agent and it is not able to take the correct actions.

without any context it’s going to just jump randomly while it learns the task yes, it should be fine and will learn eventually but it will likely jump more than you intended even after it learns to avoid the lasers unless you penalise it a little for jumping to stop it spamming the jump.