ELO calculation in an ML-Agents self play training process

I look for some help to understand the ELO results in the training process of the following game I created:

The game is a symmetric 17+4 variant for 2 players (that is a Black Jack variant with 32 cards and card values which are different from Black Jack and no dealer).

In the first example the values of the 8 cards in every suit are 1,2,3,4,5,6,7,8. This seems to work well: PPO with self play delivers these graphs for the mean rewards and elo:

6627145--755395--p2.png6627145--755398--p1.png

If I play against the produced brain model, the model plays very well. I coundn‘t detect any failures done by the model.

In a second example the values of the 8 cards in every suit are 2,3,4,7,8,9,10,11. The same PPO / self play training now delivers these graphs for the mean rewards and elo:

6627145--755401--p4.png 6627145--755404--p3.png

There is a small bias in the cumulative rewards and, after a roughly 1 million steps, a decrease in elo.
But: the code has been reviewed intensively, so that I am quite confident that the bias isn´t caused by my code. And: the trained model again plays very well versus a human player, so I wonder about the decreasing elo.

My questions: Do you have an idea what can cause such effects? Where can I find really detailed documentation of the training process and Elo calculation?

Thanks a lot!

HI @Streiicher ,
You can find the documentation for self-play here. I will reach out to someone on our research team to see if they can answer your questions.
Cheers,
Chris

Hi @Streiicher

It looks like the final rewards are negative. The ELO calculation assumes the final reward determines the winner i.e. a positive reward indicates winning, negative losing, zero draw. So, what appears to be happening is the agent is always ‘losing’.

If it doesn’t seem possible to specify a reward function that satisfies this for your game, please let me know and we can try to help.

Hi @andrewcoh_unity , thx for your reply. The reward structure ist: 1 for win and -1 for loss of agent, 0 for a draw. In the documentation Christopher sent before I cannot find something about ELO calculation. Can you give me a hint about to look this up?
Best, Martin

Hi @Streiicher

Here is the documentation ml-agents/docs/Training-Configuration-File.md at main · Unity-Technologies/ml-agents · GitHub