Hi there.
it’s my first time posting here so i’m not sure what the structure of my post should be like hope it’s alright
I’m trying to train an agent with mlagents and i want to teach it shooting arrows. it’s a simple agent with two discrete actions:
- rotating action which can have 5 values. 0 for doing nothing 1 for rotating right 2 for rotating left 3 for rotating spine upwards 4 for rotating spine downwards.
- shooting action that can have 2 values: 0 for not shooting and 1 for shooting.
there’s also two observations for the agent:
-
the angle difference between the vector that goes from the shooting position towards the center of the target and the vector that shows the current aiming direction. it’s a normalized value so if the agent’s body orientation is such that it aims towards the target, this observation is close to 0 and if it’s aiming towards the exact opposite direction it’s close to 1.
-
a boolean observation which is true when the ray that starts from the arrow spawn position and goes towards where the agent is aiming, hits the target.
so basically we have a float observation between 0 and 1 and another observation that is only true when the agent is aiming correctly and if it shoots, it would hit the target.
the reward system is such that the agent gets -0.001f reward every step to encourage it not to stop and do nothing. also it has a 2000 maxstep limit to end the episode incase the agent opts to do random things and not shoot. after shooting, if it hits the target, the agent’s reward is set to 1. otherwise, it gets a negative reward that is calculated like so : (-1) * the value of the second observation. lastly, it also gets a +0.001f reward if it is rotating such that the value of the second observation is getting closer to 0 effectively making it not get any negative or positive rewards when it’s rotating towards the goal.
i’m training the agent with reinforcement learning alone and not using things like gail and bahavior cloning(which i tried and didn’t get good results with) but after about 3.5 million steps, the agent just chooses the 0 value from both actions and as a result it stays doesn’t really do anything. this has been the case for about 500 thousand steps and it happened every time i tried retraining the agent.
I think that it might be difficult for the agent to learn how to shoot towards the target with these observations alone but at the same time i would expect it to learn the behavior regardless and do so in higher number of iterations.
i would greatly appreciate if you could give me a hint about what i’m doing wrong and how i can train this simple agent. thanks in advance <3