Hi
I was wondering if someone could provide the exact architecture for the PPO algorithm. Does it use two separate networks for the actor and critic part or does it have one network with different heads?
Also, for the “simple” visual encoder with 2 “num_layers” and 128 “hidden_units”, does that mean it has 2 CNN’s followed by 2 FC layers and the final layer with the number of output actions?
Thanks