For example, in the walker environment, if the normalization is set to false, it will learning nothing. In the C # code, the input is not scaled between 0 and 1, so the normalization should be done by network_ Settings > normalize, right? That’s where I’m confused.
if normalize is true, is that means the start of network is
torch.nn.BatchNorm1d(obs*_features)*
torch.nn.Relu() ?
I’m using my own RL code and stuck here. if it means BatchNorm1d(), due to the need to interact with the environment, the network can only get one OBS from the environment each time, so the running average and variance calculated in this way will have a very large error, especially at the beginning. I did it without success.
Did I get the wrong understanding, or that the official code first adopted random actions to get a more appropriate running average and variance (I didn’t fully understand the official code)?