I just tried out the 3d Ball example, and during training, there was a step were Reward reached 100, however, I stopped training later at 80.
When training stops, does it automatically save the best model with the heighest reward ?
Thanks
INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 52000. Time Elapsed: 370.484 s Mean Reward: 100.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 53000. Time Elapsed: 377.597 s Mean Reward: 70.676. Std of Reward: 34.083. Training.
INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 54000. Time Elapsed: 384.724 s Mean Reward: 62.453. Std of Reward: 39.364. Training.
INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 55000. Time Elapsed: 391.841 s Mean Reward: 80.014. Std of Reward: 28.839. Training.
^CUnityEnvironment worker: keyboard interrupt
INFO:mlagents.envs:Learning was interrupted. Please wait while the graph is generated.
INFO:mlagents.envs:Saved Model
No, it does not. It saves the latest trained model. This may seem counter-intuitive, but the system doesn’t know if it got a perfect score because it had a better policy, or if it just got lucky (which happens).
This isn’t currently possible, but it’s something that we’d like to add support for in the future. Our internal tracking ID for this is MLA-553.
Holy god, this would be incredible! I spend a day crunching on something (it reached 100) and then something happened where it collapsed and now it’s at -30. This seems so obvious, I’m surprised it’s not part of Tensorflow training natively.
Tensorflow saves its own checkpoints. You can use --resume
on the mlagents-learn commandline, using the run ID for the previous run.
The change to save .nn files at checkpoints should be merged next week.
1 Like
Hey Celion, I deeply appreciate the response and help. Is there an internal bug tracking the evaluation of always keeping the highest reward .nn file?
Sorry, there hasn’t been any progress on the issue since last time. The tracker ID is still MLA-553.
I can’t wait for this feature! It would be so incredible. For example, I am training some cars on a racetrack, and they start off slowly and learn to make it all the way around the track. Then they begin speeding up for more reward, and they get really good at it. But then they get too fast and start crashing, and their reward drops right back to down to where they started from. This feature would be incredible as this is something that’s really causing problems for me.
To normalize the luck factor, perhaps you can use a smoothed value of the reward history (like the chart you get when you use tensorflow --logdir) to determine performance to decide when to make a backup? I was also thinking perhaps it can be done every 500k steps by default the way it currently updates the .onnx file, and the user could be given the ability to adjust frequency of updates.
I can’t wait for this feature. It would be incredibly helpful.