so to simplify what i’m looking to do is
put say 10 random numbers which can have a range from 1 ~ 10 (numbers can be duplicated etc)
i want to let the agents (let’s say there is only 2 for now) takes turns to pick from the numbers
when a number is removed/picked a new random number is put in its place
at the end of say 5 rounds the total of all numbers picked are added up for each agent (separately) so they each get a final score.
now i want to compare the 2 results against each other so if agent 1 scores more than agent 2 he gets a higher reward and vice versa
note: it’s important the scores be compared against each other and this is where i am confused
currently when picking is complete i compare the scores, reward each agent based on who has the higher score, end the episodes and then restart.
will the AI learn if both agents are using the same brain at the same time but also competing? i’m thinking i should be using self play but i’m not sure how it works - if i set them to self play how do i reward them then? is there a good example/demo i can look at?
Yes, this can be solved with self-play. You can see the self-play setup in the SoccerTwos example environment - if it’s only 5 rounds, you might be able to get away with just giving a +1 to the winning agent, and a -1 to the losing agent. You’ll also probably want to give the current score of the opponent as an observation, or a sensor stack of 5.