Cummulative Reward goes down

Taking this image into account:
6081546--659964--Environment_Cumulative Reward (5).png

Is there a problem with this training, because the final result that my agents have is the one that I was looking for. What does it mean that at the 3,5M steps the rewards starts to go down? Is this a problem? How can I explain this behaviour in my document?
Here you have the entrpoy if someone need it. Is this because the reward that it recieves from the entropy?
6081546--659967--Policy_Entropy (5).png

Entropy looks good but from my understanding I would expect the reward to increase when the entropy goes down. Is there any randomness in your environment and how many steps does one episode last on average? Was the training resumed or does it already reach a reward of 220 during the first 250k steps?

Yes there is randomness in my environment, but it always happen at the end that the reward goes down. Episode length 250. It was not resumed, it is beacuse the summary_freq is 250000, so it reaches the reward in that step

If the final behavior matches your expectations, I wouldn’t say this is a problem. Is it possible that your environment is evolving in some way over time that this reward curve makes sense?

Yep, but how can I explain that this was a good training if the graphs look bad. It seems not to be stable.

Maybe it’s because of the lack of exploration at the end of training with an environment that still changes to unknown shapes? Are you using PPO or SAC? Setting the learning rate to constant instead of linear might prevent the reward decrease.

PPO

does it help setting it to constant?