Hi!
I’ve set up a self play game, but for some reason I see ELO consistently going down rather than up. I think it’s tied to learning agent choice.
I have two agents, let’s call them Left and Right. Left is assigned to Team 0, Right is assigned to Team 1. They are listed in the order: Left, then Right in game object hierarchy. My camera follows player Left, and I assumed player Left would be the training player, and player Right would be the swapping snapshot from the past.
However, the mlagents-learn script shows that Helicopter?team=1 is the learning player. This is rather counter intuitive. When I see a win from the point of view of Left, the game records lowering of ELO for the player Right.
How does it determine the learning agent?
Why would it choose ?team=1 player over ?team=0?
Their weights are loaded in order from ?team=0 to ?team=1.
Furthermore, I think the Right agent gets into the downwards spiral when he’s always facing the agents from his past, who all have higher ELO than him, and are stacked to win again. I recorded the demonstration from the point of view of player Left, so this might work against player Right.