I am studding ML-Agents and making an F1 race themed game…
In my training, I always start with 20 cars, and go inactivating them as they make dumb choices, like colliding of going too off track… But, as soon I end one episode, the framework start a new one, and to me is a problem, because I want to start the race with all 20 cars… So, each car have 2 separate objects, one for the car itself, and one for the car agent… So, I only disable the car itself, and stop calling for decisions and actions from the agent… Once all the cars have finished, I proper end session… I could just disable the car, but the episode end reason will be always disabled, and that is not true… sometimes, when only couple car remains, I find more useful tell that max step reached instead waiting for a long time to it finish…
- The above logic is right?
From my experience, if I try explain too much things at same time to the agent, it wont learn as fast… So I divided my training into 3 steps
- All cars are virtual, and they only have to do the racetrack; Note that cars see each other, but they don’t collide yet;
- Cars are physical, they collide against each other, and they need to complete a lap without colliding with no one… (Takes considerably more time than step one)
- The cars are physical and they are competing with each other.
- This separated method is better / recommended?
Is very tricky make one car want to overtake other car, because is risky process, and one collision will just take both out of the process… And since one car gains bonus and the other loses the bonus, the overtaking process isn’t easily interpreted by a standard PPO scheme. Instead I opted to use self-play, but self-play by definition is used on 1 vs 1 games, like chess… So what I ended doing is the following: I stablish a nemesis for each car, on a grid forming 10 nemesis pairs, and at the end of each episode I compare each pair score (that are collected in a separated place), and assign victory to the one with more points; I know this isn’t ideal, because like, the first one can’t overtake no one, but this is the better that I could figure out…
3) Is this acceptable / recommended?