Thanks mlagents,I applied it and it gives me a great model ! But I can’t figure out what each leayer means, therefore, I can’t do inference without barrcuda(**I want to copy the parameters from .pt model and write ‘Matrix Multiplication,Relu Funcitons’ etc by myself to inference it on other applications**).

I Use Netron to visualize the model,it looks like below(the inputs is 128-batchsize,4-dim,the output is just one discreteAction,162-possibilities) .what they mean and how can I understand it ?

I print the model,the ‘Policy’ part of it.I can see ‘network_body.observation_encoder.processors.0.normalizer.running_mean’ and

‘network_body.observation_encoder.processors.0.normalizer.running_variance’ are the input’s mean and variance,before they feed into the model,they should use it to normalize.But what ‘network_body.processors.0.normalizer.running_mean’ and

‘network_body.processors.0.normalizer.running_variance’ mean?