I’m trying to train some agents using self play, and currently I am a bit confused by the output.
First a “My Behaviour” is detected that I cannot find anywhere in my scene, and should not be there.
2021-04-05 22:54:35 WARNING [trainer_factory.py:60] Behavior name My Behavior does not match any behaviors specifiedin the trainer configuration file: ['ShipAI']
2021-04-05 22:54:35 INFO [stats.py:186] Hyperparameters for behavior name My Behavior:
During training itself the training itself at first the ELO seems to update, but eventually it doesn’t get reported anymore. Why does it suddenly stop? Also, why is ELO decreasing while the mean group reward is positive?
The Group Rewards are assigned perfectly symmetrically, so anything above 0 should be bigger than their opponent.
2021-04-05 23:03:05 INFO [stats.py:180] ShipAI. Step: 20480. Time Elapsed: 517.991 s. Mean Reward: 0.000. Mean Group Reward: 0.699. Training. ELO: 1000.744.
2021-04-05 23:10:33 INFO [stats.py:180] ShipAI. Step: 25600. Time Elapsed: 965.617 s. Mean Reward: 0.000. Mean Group Reward: 0.389. Training. ELO: 998.989.
2021-04-05 23:17:54 INFO [stats.py:180] ShipAI. Step: 30720. Time Elapsed: 1406.548 s. Mean Reward: 0.000. Mean Group Reward: 0.453. Training. ELO: 998.492.
2021-04-05 23:25:15 INFO [stats.py:180] ShipAI. Step: 35840. Time Elapsed: 1848.152 s. Mean Reward: 0.000. Mean Group Reward: 0.335. Training.
2021-04-05 23:32:45 INFO [stats.py:180] ShipAI. Step: 40960. Time Elapsed: 2297.405 s. Mean Reward: 0.000. Mean Group Reward: 0.351. Training.
2021-04-05 23:39:56 INFO [stats.py:180] ShipAI. Step: 46080. Time Elapsed: 2729.359 s. Mean Reward: 0.000. Mean Group Reward: 0.434. Training.
2021-04-05 23:40:52 INFO [stats.py:180] ShipAI. Step: 51200. Time Elapsed: 2785.278 s. Mean Reward: 0.000. Mean Group Reward: 0.279. Training.
2021-04-05 23:50:49 INFO [stats.py:180] ShipAI. Step: 56320. Time Elapsed: 3381.634 s. Mean Reward: 0.000. Mean Group Reward: 0.312. Training.```
Lastly one of the training environments seems to time out, which I haven't been able to reproduce running it in the editor, causing all environments to shut down, with mlagents throwing an UnityTimeoutException.
2021-04-05 23:52:11 INFO [subprocess_env_manager.py:220] UnityEnvironment worker 5: environment stopping.
2021-04-05 23:53:11 INFO [environment.py:431] Environment timed out shutting down. Killing…
2021-04-05 23:58:04 INFO [model_serialization.py:183] Converting to results/TestAI_5/ShipAI/ShipAI-58027.onnx
2021-04-05 23:58:04 INFO [model_serialization.py:195] Exported results/TestAI_5/ShipAI/ShipAI-58027.onnx
2021-04-05 23:58:04 INFO [torch_model_saver.py:116] Copied results/TestAI_5/ShipAI/ShipAI-58027.onnx to results/TestAI_5/ShipAI.onnx.
2021-04-05 23:58:04 INFO [model_serialization.py:183] Converting to results/TestAI_5/My Behavior/My Behavior-0.onnx
2021-04-05 23:58:04 INFO [model_serialization.py:195] Exported results/TestAI_5/My Behavior/My Behavior-0.onnx
2021-04-05 23:58:04 INFO [torch_model_saver.py:116] Copied results/TestAI_5/My Behavior/My Behavior-0.onnx to results/TestAI_5/My Behavior.onnx.
2021-04-05 23:58:04 INFO [trainer_controller.py:81] Saved Model
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
2021-04-05 23:58:07 INFO [environment.py:429] Environment shut down with return code 0.
I would appreciate any help on what is going on here.