Hey everyone,
I had this initial setup where I have a spinning laser, and my agent needs to go around the room collecting rewards without being hit by the laser. Picked up rewards give plus points, and getting hit by the laser punishes the agent -1
and ends the round. This setup worked quite well, after training the agent was able to navigate around.
I wanted to take it a step forward, where there are now 2 lasers that rotate in opposite directions, so eventually the agent MUST jump over them to keep playing. Here is how jumping works:
- The agent has a discrete action with 2 potential values
0
or1
. If it is set to1
andcanJump
is true, the agent jumps. - Once the agent jumps,
canJump
becomes false. It has a 2 second cooldown that only ticks down when the agent is grounded.
I often see the agent just jump at the wrong time, and then dying in-between the two lasers because the jumping is on cooldown. Sometimes it jumps at the right time too…
Most observations of the agent are done via ray sensors (the walls, the lasers, the rewards), but in order to further inform the agent about the jumping I added observations as such:
- Agent is passed
isGrounded
boolean. - Agent is passed float
jumpTimer / 2f
, so this would be 0 when the agent jumps and slowly creeps up to 1 when the agent is ready to jump again.
Note: Every observation (ray sensors + manual observations) are 2 stacked. So the agent uses this and previous frame’s observations to make decisions.
What do you guys think about this approach? Why or why not is the agent able to get a grasp of this jumping mechanic? Do I just need to train for longer?
I feel like there is something wrong about the way I convey the information about jumping to the agent and it is not able to take the correct actions.