I’m training an Agent to learn how to drive a car around a track.
While one track goes well, others go not as smooth.
So my question is this: when using multiple Agent instances to learn the same policy (to be clear here:, they all have the same behavior name), is it recommended to:
Teach the agents all track simultaneously (to have like 4 agents on each track all training at once)
or
On a track by track basis? (to have all agents learn track 1, then track 2, then track 3 etc.)
It’s probably best to randomize your tracks during training. Or cycle through them with each new episode. This way you prevent the policy from overfitting to a particular track, before introducing a new one. If the tracks vary a lot with regard to difficulty, then you might want to start with the easiest ones first. Training multiple agents simultaneously is a good way to collect more experiences in a given time span. I’d recommend having a couple of agents in the scene if that doesn’t cause any performance issues.
If you like, check out GitHub - mbaske/ml-simple-driver: Basic car controls for Unity ML-Agents - the project contains a basic procedural track generator that’s supposed to generalize the agent’s behaviour.
It would be best to train agents on variants of the tracks simultaneously. RL agents will ‘overfit’ to their most recent training environment so if agents are trained on track 1 and then track 2, they will almost certainly ‘forget’ track 1 (called catastrophic forgetting).
That said, it is very important that your observations are not ambiguous when the agent is training with multiple tracks. For example, if the agent’s observations are just its coordinates, on one track it may be correct to turn left and on another track it may be correct to turn right. This would not train correctly.
Let me know if you have any other questions. If you’d like to share your observations and reward function, I may be able to give you a better answer.