I’m updating a project I teach in my AI Flight Udemy course from version v0.11 to v0.14 and I’m confused by how inconsistent my training runs have been since upgrading everything. I had to account for the Academy singleton and RayPerception changes, but that’s basically all that has changed from a project that trained very reliably.
The training seems to be working, and then it will spontaneously flatline for up to an hour before picking back up. I made no changes between the two runs in the Tensorboard graph below. (each 5M steps took about 1 hour and 10 minutes, a reward of -1 means an airplane crashed immediately, and a reward above 20 means they flew through 40 checkpoints without crashing)
Does anyone know what would cause these strange flatline dips? Thanks!!
I turned off curiosity and did two more training runs which both worked great. So maybe the agents got curious about what would happen if they crashed into rocks for an hour…?
hmm, interesting… let me post in our internal thread
That type of behavior with Curiosity actually makes sense. Are you seeing a big spike in the Curiosity reward around those plateaus? Crashes and failures tend to be very unpredictable (and result in all sorts of weird states) so the Curiosity module tends to find them extremely interesting.
Here’s the same graph with Curiosity Inverse Loss overlaid. Obviously the scale on the Y-axis is different. Not sure if this explains much, but your logic does.
Ohhh, now I see it. The Curiosity Value Estimate graph was hidden in Tensorboard for some reason. Yeah, it totally spikes during the troughs of the Cumulative Reward graph.
1 Like