agent doesn't learn shooting behavior

Hi there.

it’s my first time posting here so i’m not sure what the structure of my post should be like hope it’s alright :slight_smile:

I’m trying to train an agent with mlagents and i want to teach it shooting arrows. it’s a simple agent with two discrete actions:

  1. rotating action which can have 5 values. 0 for doing nothing 1 for rotating right 2 for rotating left 3 for rotating spine upwards 4 for rotating spine downwards.
  2. shooting action that can have 2 values: 0 for not shooting and 1 for shooting.

there’s also two observations for the agent:

  1. the angle difference between the vector that goes from the shooting position towards the center of the target and the vector that shows the current aiming direction. it’s a normalized value so if the agent’s body orientation is such that it aims towards the target, this observation is close to 0 and if it’s aiming towards the exact opposite direction it’s close to 1.

  2. a boolean observation which is true when the ray that starts from the arrow spawn position and goes towards where the agent is aiming, hits the target.

so basically we have a float observation between 0 and 1 and another observation that is only true when the agent is aiming correctly and if it shoots, it would hit the target.

the reward system is such that the agent gets -0.001f reward every step to encourage it not to stop and do nothing. also it has a 2000 maxstep limit to end the episode incase the agent opts to do random things and not shoot. after shooting, if it hits the target, the agent’s reward is set to 1. otherwise, it gets a negative reward that is calculated like so : (-1) * the value of the second observation. lastly, it also gets a +0.001f reward if it is rotating such that the value of the second observation is getting closer to 0 effectively making it not get any negative or positive rewards when it’s rotating towards the goal.

i’m training the agent with reinforcement learning alone and not using things like gail and bahavior cloning(which i tried and didn’t get good results with) but after about 3.5 million steps, the agent just chooses the 0 value from both actions and as a result it stays doesn’t really do anything. this has been the case for about 500 thousand steps and it happened every time i tried retraining the agent.

I think that it might be difficult for the agent to learn how to shoot towards the target with these observations alone but at the same time i would expect it to learn the behavior regardless and do so in higher number of iterations.

i would greatly appreciate if you could give me a hint about what i’m doing wrong and how i can train this simple agent. thanks in advance <3

forget negative values in sparse environment where the agent has the option to end it quickly or dont do nothing.

If someone would punish you for not hitting a target and you always miss, would you shoot? NO? If you dont know what to do and you get all the time a negative value, what ever you do, wouldn’t you be confused?
I solved a similar problem with a pid controller always looking at a target and the ML just sets the offset as an angle.
no negative values, but for every shot hit it gets 1 point, so it optimized to shoot as many and as precise as possible. Additionally the targets moved and the arrow had weird aero dynamics.

if you really want to make it with ML only you need much more training time and use curiosity. maybe even give it points for shooting, or for every arrow that lands somewhere give an reward according to the nearest position of the arrow while it flew.
You can try to optimize with negative values after it trained well.

LSTM often help but are about 5x slower to train (I have a good GPU so its okay)