I am training an ml agent model in which it has to collect food , deposit it and every once in a while take shelter i.e. go to a safe spot in the world and stand still. I used GAIL and BC on top of PPO. On training the model my agents picked up the right actions very quickly but a few hours in they started regressing from the behavior and started to perform random incorrect actions. Im struggling to understand why that happened? Any insights would be really helpful.
For reference :- Gail Training parameters :- strength - 0.1, use_actions : true, use_vail : true , learning_rate:0.0009
BC training parameters :strength:0.1
Extrinsic Reward:- strength : 1
HyperParamters : beta :- 0.005
Learning Rate :- 0.0009
Need some help badly.
