Train Multiple Behavior Names With Python API

I have a simple game which has 2 different behavior names. I have 2 agents; one of each type. I would like for these two agents to train against each other. I can do this using the built-in command
mlagents-learn But, I would like to customize the training so I am defining my own script using the following guide that is linked in the documentation: Google Colab

Though, in this guide (and every other one I can find), they only train 1 behavior at a time. As I said, I would like both to be trained.

How can I go about doing this?

ML-Agents will create a trainer for any behavior name that it detects in the environment and it will check the specified yaml config for how to configure the trainer for each behavior name. To see an example of this in ML-Agents, please check out our strikers vs goalie example environment.

As a side note, if this is an explicitly adversarial situation, I recommend using our self-play mechanism. To see how this is used in the asymmetric case, also check out strikers vs goalie.

I know that specifying the two behaviors in the yaml config will configure two behavior names correctly. This is not the issue.

The Goalie vs Striker example environment seems to be about using the mlagents-learn command, as they provide no python code and only provide information in regards to the unity environment. But as I said, I am trying to implement my own version of that command.

In particular, I am concerned with the following lines of code in the cell titled “Get the Behavior Specs from the Environment” in the google colab notebook I linked before:

# We will only consider the first Behavior
behavior_name = list(env.behavior_specs)[0]

The notebook then only trains that particular behavior name. However, I want to train all behavior names at the same time. Like I said, this is possible using the built in mlagents-learn command, but I am not sure how to do this using the python code provided in the notebook. The only thing I can think of is to run separate threads for each behavior name. This appears to be what they do in the mlagents-learn command (based on looking at the file ml-agents/mlagents/trainers/learn.py), but I am not totally sure if that’s true.

Perhaps I am missing something. If so, please let me know.

Ah, I see. Sorry I misunderstood your original post.

You should be able to do this by simply generalizing the code from behavior_name to a list of behavior_names.

behavior_names = list(env.behavior_specs)

for behavior_name in behavior_names:
  env.set_actions(behavior_name, action)

and so on.

2 Likes

Thank you, that works.