Hi, Sometimes my training stops with this "dictionary changed size during iteration" error. Is anyone familiar with that? I’m using concurrent environments, and mlagents release 2.
This was from command prompt:
File “c:\users\hello\desktop\project\ml-agents-release_2\ml-agents\mlagents\trainers\stats.py”, line 344, in write_stats
for key in StatsReporter.stats_dict[self.category]:
RuntimeError: dictionary changed size during iteration
This was from my build’s debug log:
Unable to save timers to file C:/Users/hello/Desktop/project/builds/7_3_2/agents2_Data\ML-Agents\Timers\Clay3D_timers.json
(Filename: C:\buildslave\unity\build\Runtime/Export/Debug/Debug.bindings.h Line: 35)
Any idea what’s happening, or ways to stop this behavior?
The stats.py error is really weird - I understand that modifying the dictionary while iterating over it is bad, but don’t see how that could be happening here. Could you open a github issue with the full callstack (and maybe some more info about your python version)?
The “Unable to save timers to file” message should be harmless.
I had the same issue. Used Release 3 and the default 3DBall environment with SAC. Only changed the max_steps to 1 million. Occurred at around 700k steps or so. So it should (hopefully) be easy to reproduce.
I tried a few times but can reproduce the problem (3DBall, release 3, SAC, max_steps=1000000). Can you please post the full callstack of the error, command line args you’re using to run, and output from “python --version”?
Still can’t reproduce it, but I have a theory - I think StatsReporter is getting called from different threads simultaneously, so one thread causes a new key to be added (via add_stat or set_stat) while write_stats is being called.
We were able to reproduce the problem (decreasing the summary frequency to 1 and adding a sleep in the loop makes it happen almost immediately). PR to fix is here: [bug-fix] Make StatsReporter thread-safe by ervteng · Pull Request #4201 · Unity-Technologies/ml-agents · GitHub
Thanks for reporting this!
2 Likes
Great! Happy it got solved
If this is causing a problem for training, and you’re comfortable modifying the python code, a simpler workaround is to convert the loop in question to
The fix will be in the next release, tentatively scheduled for next week.
1 Like