Hi, I’m training a car to drive around some obstacles. I’m running two versions of the agent one with continuous actions -steering and acceleration- and one with discrete actions, where steering and acceleration are the same values that I use in the continuous version, discretized in N values.
I’m training with PPO, and I get these two plots for the reward: Imgur: The magic of the Internet
As you can see, while discrete continues to improve, the continuous version is completely stuck.
I’m not asking you anything specific to my code, but I’d like to have some insights about how can this happen. Could it be due to not enough exploration?
If you need other information ask me.
Interesting. Some questions:
- Is it at all possible (maybe due to a bug or side effect) that -1 is the maximum lifetime achievable reward value for your Continuous version?
- What’s your observation stack size on the Continuous version? Today I had a variant of my training where, after adding 2 more observations, it started to act weirdly and plateau after a while with some random spikes, not unsimilar to your case (thought it plateaued on the bottom). It was fixed again by removing 2 observations (bringing it from 18 to 16 again, ray sensors notwithstanding).
- Just in case it’s relevant, how many num_layers do you have in your Yaml config settings for this training? (I have 3.)
On a side note, in my tries for a target-finding helicopter, I ended up using Continous, as Discrete never properly worked (I had figured by restricting it to what are basically WASD keypress bools, I’d optimize the decision room, but that wasn’t successful), and I also ended up using rather more ray sensors than less. I’m not 100% sure though which change it was that brought big improvements (the Continuous switch, or the Sensors additions).