Hello,
Let’s take a simple example with a simple cube, with movements + one attack.
Two discrete branches:
- Movement size 5
- Attack size 2 (Attack or not attack)
And let’s say the attack take 2seconds to complete. During this time, movement is not allowed. (Because it’s a simplified example, one discrete branch would be enough, but in our project it’s a bit more complicated)
How can we make the cube to not spam attacks all the time? I know that eventually in the end, it will reduce the use of them to optimise it’s reward. But because of RL works, with a high amount of random action on start, the agent will obviously attack all the time. And the agent never tries to just walk for the majority of the training. This does seems to be a very inneficient way of learning, that’s why i’m trying to find a better way.
-
Curriculum learning could be a solution, but teaching him first to reach the target is a bit cheating.
-
Adding a cooldown on the attack just for the training seems a bad idea, since the exercice would be different
-
Penalty on missed attacks seems to not be efficient enough when used with a big horizon
-
Gail seems not to be compatible with our project, due to the need of heavy generalisation.
Does someone have ideas on an approach to solve this problem?
Thank you