I am geeting these noisey entropies, I think that is because of beta, but how much has to decrease more or less the entropy in 1 Million steps?
The image is in 5Million steps, so I imagine I have to deacrease beta, but until when?
Hi, is this using PPO or SAC?
For PPO, beta should decay as long as you’ve set your learning_rate_schedule
to linear. You can try lowering beta.
What do your value and policy losses look like? Your agent might be having trouble converging.