ml agents, multipolicy self play.

Hi everyone,
with self play is possible to train multiple agent in a competitive environment, but they all share the same goal, perception of the world etc. basically they share the same policy during episodes.

in the context of herbivores vs carnivores, herbivores have to learn to find plants, and avoid predators, while carnivores have to learn how to catch herbivores, they need different perception of the surrounding environment, and they need their own policy. Reward is life dependent, the older you get, the higher the score.
when an agent dies, AddReward(-1f), EndEpisode(), and it respawns, starting a new episode in the ALREADY running env. (no env reset, just the dead agent)!

Made a simple env named EnvSym, gave agents different behavior names (Carnivore and Herbivore), made a config file named EnvSym.yaml, and launched training.

unity does connect with 2 brains, names are correct (Carnivore, Herbivore), but their parameters are sort of default, not my config. Already happened once because the name of the config file didn’t match the name of the behavior.
tried to make 2 config files with the behaviors names, but i don’t know if there’s a command to call 2 different files at once.
mlagents-learn config/ppo/??? --run-id=EnvSym01.

is there a way to make different configs in the same file? should they be separated?

is it correct to split carnivores and herbivores into 2 different teams? they are not really a “team”, cooperation can be useful, but the goal is still “survive as much as you can on your own”.

and the most important question: is my project even possible right now with ml-agents?

I talked to our resident self-play expert; this should work with self-play. Just be aware that the ELO rating may not be a useful metric in your scenario, since it was designed for zero-sum games, and it doesn’t sound like yours is.

For the config files, have a look at the StrikersVsGoalie config here: ml-agents/config/ppo/StrikersVsGoalie.yaml at release_4 · Unity-Technologies/ml-agents · GitHub

yes, i found by trial and error that i can put multiple behavior in the same config file, and the training starts just fine.

Thanks for the answer, i’ll put self play back on.

if i get something good i’ll let you know :slight_smile: