Clarification of PPO source code

Hello!!!

I study the code of the PPO algorithm.
In fact, I have been confused.
Could someone tell me some things?

Where is defined the activation function?

Where is the update function of the model?

If I want to use a new RL Algorithm which requires another hyperarameter, where I have to change the code?

And what is the use of Tensornames? how are all these connected?
I am sorry for all these questions but I try to understand the code.

Thank you in advance!

We use the “swish” activation function for most models: https://github.com/Unity-Technologies/ml-agents/blob/release_5/ml-agents/mlagents/trainers/models.py#L92-L95

The policy is updated in PPOTrainer._update_policy: https://github.com/Unity-Technologies/ml-agents/blob/release_5/ml-agents/mlagents/trainers/ppo/trainer.py#L145

There’s no good way to add new algorithms right now. It’s something we’re going to work on in the future. Your two options are to use the low-level python API if you want to write a trainer from scratch, or change where the Trainers are created here.

The TensorNames class contains string constants that we use to look for special tensors when loading the model, so that we can connect to them properly when doing inference. For example, the “vector_observations” tensor is created here and we use that string to decide which tensor is the right one, so that we can compare the expected sizes are correct here.

ok!!!
thank you very much for your detailed answer!

If I will use the low-level Python API, I can build the environment from Unity and import it in the script.
At the end will a .nn file be created? So I can see my trained agent or this is not will happen?

No, the low-level Python API doesn’t know anything about neural networks. If your goal is to produce a .nn file that can be loaded in Unity, I would not recommend the low-level Python API.

Ok thank you very much! I understood!