Forcing PPO to be more explorative

Hi,

I’m training a number of agents. In the beginning I see the cumulative reward increasing, however after a few hours it seems to be stuck at a plateau. What would good ideas (hyperparameters etc) be to make the agents behave more explorative instead of “taking the easy way out”?

If you look in the training configuration file documentation then some parameters are associated with exploration, such as beta in PPO

You can also use Curiosity Intrinsic Reward which can be configured also in the training configuration file and is supposed to increase exploration.

You can also use GAIL in order to inject expert knowledge which can help with getting out of local minimums.

And the most brute force way (if possible) is to random generate the environment and the agents starting state on each episode.

1 Like

Thanks a lot!