PPO On-Policy or Off-policy

I have read the book: "Learn Unity ML-Agents – Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games" and here it says that PPO is off-policy. However, in this link:
https://stats.stackexchange.com/questions/427140/is-proximal-policy-optimization-ppo-an-on-policy-reinforcement-learning-algori#:~:text=TRPO%20and%20PPO%20are%20both,far%20from%20the%20underlying%20objective.
They say that is on-policy.
Someone can help me? Why is on or why is off policy?

PPO is a on-policy algorithm, you can learn about it more by looking at this paper. https://arxiv.org/abs/1707.06347

1 Like

Does the "actor" network for the agent in PPO shares its weights with the "critic" network? When estimating the baseline for the advantage function? Or does Unity generates a copy of the neural network with independent weights for actor and critic? Many thanks!

In the latest version, ML-Agents generates a copy of the network with independent weights for actor and critic. In some older versions, they were shares but only for discrete action networks.

1 Like

Thank you for the reply!