In my training environment, the theoretical max reward is 800 per session, and I have a brain that gets close to this in training: Image 2020-04-15 at 8.43.50 PM
However, when I try to use this model on the same agent in Unity and infer values, the mean reward I’m getting is around 20-100 per session, so it doesn’t seem to be behaving at all in use as it does in training.
Does anyone have any insight on why this might be occurring? I have been trying to figure this out for days, thank you for your help!
I’ll flag this for the team to give their thoughts. In case they ask, which version of C# & Python are you running? Additionally, do you have any console logs you can share? Thanks!
Hi,
I think this might be due to an issue with the simulation being run at large time scales. Could you try to run inference from Python? If you are using v0.15.X, run
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier> --load --time-scale=1
Note the use of –load instead of --train and the –time-scale=1
This should run your game using the model you trained with the run-id (it will not train, just load the pre-trained model) If the reward reported in lower than 800, it probably means that your game behaves differently at lower vs higher time scales.
If you’re doing anything in Update rather than FixedUpdate (and are using physics at all) then you will probably see deviations…