I trained my agent to park a car in a simple environment with PPO algorithm and it worked well, but then when I tried to train it with SAC it trains for a bit and after a while it seems to stop doing any actions.
Agent gets reward for coming closer to the parking spot a gets a big reward after fully parking. Agent also gets reset after moving away from parking spot or not moving for some time.
Version of Unity: 2019.4.4f1
Version of ML-agents: 1.0.6
I’ll bounce this off the team for some insight and guidance.
Hi @jednomije
Can you also share your console output? It looks like some NaNs are occurring (judging by the entropy plot).