MLAgents always crashes after 500000 steps

I’m having an issue where, no matter what I do, MLAgents seems to crash at 500,000 steps. I have tried messing with the yaml file but haven’t had any luck.

While it crashes, it correctly outputs the NN file at 500,000 steps.

mlagents-learn config.yaml --env=build/game --num-envs=6 --no-graphics --run-id=HunterAug1d
max_steps: 1.0e6 #also tried 1e6
020-08-02 03:24:53 INFO [stats.py:101] PreyAgent: Step: 50000. Time Elapsed: 84.336 s Mean Reward: -10.210. Std of Reward: 4.147. Training.
2020-08-02 03:24:53 INFO [stats.py:101] HunterAgent: Step: 50000. Time Elapsed: 84.370 s Mean Reward: 10.340. Std of Reward: 4.045. Training.
2020-08-02 03:26:10 INFO [stats.py:101] HunterAgent: Step: 100000. Time Elapsed: 160.819 s Mean Reward: 9.749. Std of Reward: 4.078. Training.
2020-08-02 03:26:10 INFO [stats.py:101] PreyAgent: Step: 100000. Time Elapsed: 160.861 s Mean Reward: -9.663. Std of Reward: 4.145. Training.
2020-08-02 03:27:25 INFO [stats.py:101] HunterAgent: Step: 150000. Time Elapsed: 235.878 s Mean Reward: 8.233. Std of Reward: 4.155. Training.
2020-08-02 03:27:25 INFO [stats.py:101] PreyAgent: Step: 150000. Time Elapsed: 235.914 s Mean Reward: -8.089. Std of Reward: 4.243. Training.
2020-08-02 03:28:40 INFO [stats.py:101] HunterAgent: Step: 200000. Time Elapsed: 311.086 s Mean Reward: 8.099. Std of Reward: 4.021. Training.
2020-08-02 03:28:40 INFO [stats.py:101] PreyAgent: Step: 200000. Time Elapsed: 311.129 s Mean Reward: -7.890. Std of Reward: 4.038. Training.
2020-08-02 03:29:56 INFO [stats.py:101] PreyAgent: Step: 250000. Time Elapsed: 387.289 s Mean Reward: -8.274. Std of Reward: 4.451. Training.
2020-08-02 03:29:56 INFO [stats.py:101] HunterAgent: Step: 250000. Time Elapsed: 387.321 s Mean Reward: 8.553. Std of Reward: 4.436. Training.
2020-08-02 03:31:11 INFO [stats.py:101] PreyAgent: Step: 300000. Time Elapsed: 462.680 s Mean Reward: -7.454. Std of Reward: 4.075. Training.
2020-08-02 03:31:12 INFO [stats.py:101] HunterAgent: Step: 300000. Time Elapsed: 462.714 s Mean Reward: 7.712. Std of Reward: 4.039. Training.
2020-08-02 03:32:26 INFO [stats.py:101] PreyAgent: Step: 350000. Time Elapsed: 537.420 s Mean Reward: -7.014. Std of Reward: 3.716. Training.
2020-08-02 03:32:26 INFO [stats.py:101] HunterAgent: Step: 350000. Time Elapsed: 537.452 s Mean Reward: 7.257. Std of Reward: 3.694. Training.
2020-08-02 03:33:42 INFO [stats.py:101] PreyAgent: Step: 400000. Time Elapsed: 613.628 s Mean Reward: -7.498. Std of Reward: 4.707. Training.
2020-08-02 03:33:42 INFO [stats.py:101] HunterAgent: Step: 400000. Time Elapsed: 613.662 s Mean Reward: 7.670. Std of Reward: 4.682. Training.
2020-08-02 03:34:58 INFO [stats.py:101] PreyAgent: Step: 450000. Time Elapsed: 689.006 s Mean Reward: -8.772. Std of Reward: 5.101. Training.
2020-08-02 03:34:58 INFO [stats.py:101] HunterAgent: Step: 450000. Time Elapsed: 689.041 s Mean Reward: 8.801. Std of Reward: 5.022. Training.
2020-08-02 03:36:14 INFO [stats.py:101] PreyAgent: Step: 500000. Time Elapsed: 765.620 s Mean Reward: -9.187. Std of Reward: 5.112. Training.
2020-08-02 03:36:14 INFO [rl_trainer.py:151] Checkpointing model for PreyAgent.
2020-08-02 03:36:14 INFO [stats.py:101] HunterAgent: Step: 500000. Time Elapsed: 765.670 s Mean Reward: 9.222. Std of Reward: 5.031. Training.
2020-08-02 03:36:14 INFO [rl_trainer.py:151] Checkpointing model for HunterAgent.
2020-08-02 03:36:19 INFO [trainer_controller.py:76] Saved Model
2020-08-02 03:36:19 INFO [model_serialization.py:203] List of nodes to export for brain :PreyAgent
2020-08-02 03:36:19 INFO [model_serialization.py:205]   is_continuous_control
2020-08-02 03:36:19 INFO [model_serialization.py:205]   trainer_major_version
2020-08-02 03:36:19 INFO [model_serialization.py:205]   trainer_minor_version
2020-08-02 03:36:19 INFO [model_serialization.py:205]   trainer_patch_version
2020-08-02 03:36:19 INFO [model_serialization.py:205]   version_number
2020-08-02 03:36:19 INFO [model_serialization.py:205]   memory_size
2020-08-02 03:36:19 INFO [model_serialization.py:205]   action_output_shape
2020-08-02 03:36:19 INFO [model_serialization.py:205]   action
Converting results\HunterAug1d\PreyAgent/frozen_graph_def.pb to results\HunterAug1d\PreyAgent.nn
GLOBALS: 'is_continuous_control', 'trainer_major_version', 'trainer_minor_version', 'trainer_patch_version', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 741] => 'policy/main_graph_0/hidden_0/BiasAdd'
IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice'
IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice_1'
OUT: 'action'
DONE: wrote results\HunterAug1d\PreyAgent.nn file.
2020-08-02 03:36:20 INFO [model_serialization.py:83] Exported results\HunterAug1d\PreyAgent.nn file
2020-08-02 03:36:20 INFO [model_serialization.py:203] List of nodes to export for brain :HunterAgent
2020-08-02 03:36:20 INFO [model_serialization.py:205]   is_continuous_control
2020-08-02 03:36:20 INFO [model_serialization.py:205]   trainer_major_version
2020-08-02 03:36:20 INFO [model_serialization.py:205]   trainer_minor_version
2020-08-02 03:36:20 INFO [model_serialization.py:205]   trainer_patch_version
2020-08-02 03:36:20 INFO [model_serialization.py:205]   version_number
2020-08-02 03:36:20 INFO [model_serialization.py:205]   memory_size
2020-08-02 03:36:20 INFO [model_serialization.py:205]   action_output_shape
2020-08-02 03:36:20 INFO [model_serialization.py:205]   action
Converting results\HunterAug1d\HunterAgent/frozen_graph_def.pb to results\HunterAug1d\HunterAgent.nn
GLOBALS: 'is_continuous_control', 'trainer_major_version', 'trainer_minor_version', 'trainer_patch_version', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 741] => 'policy/main_graph_0/hidden_0/BiasAdd'
IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice'
IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice_1'
OUT: 'action'
DONE: wrote results\HunterAug1d\HunterAgent.nn file.
2020-08-02 03:36:20 INFO [model_serialization.py:83] Exported results\HunterAug1d\HunterAgent.nn file
2020-08-02 03:36:20 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:21 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:21 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:22 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:22 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:22 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:23 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:23 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
2020-08-02 03:36:24 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 241, in _feed
    send_bytes(obj)
  File "C:\Program Files\Python38\lib\multiprocessing\connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "C:\Program Files\Python38\lib\multiprocessing\connection.py", line 290, in _send_bytes
    nwritten, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended
2020-08-02 03:36:24 INFO [environment.py:418] Environment shut down with return code 0 (CTRL_C_EVENT).
ML Agents Package: Release 4
Windows 10 Environment
Python               3.8.5
mlagents               0.18.0
mlagents-envs          0.18.0
tensorboard            2.3.0
tensorboard-plugin-wit 1.7.0
tensorflow             2.3.0
tensorflow-estimator   2.3.0
numpy                  1.18.5

It sounds like we’re not reading the number of steps correctly, and using the default instead. Can you post the contents of your config.yaml file?

Also, at the start of training, you should see something like this:
2020-08-03 11:39:36 INFO [stats.py:131] Hyperparameters for behavior name HunterAug1d

Can you post the info from there too?

Thanks!