Hi, I am currently working on a project on Reinforcement Learning when I came across ML-Agents and have been using it for some time.
I am trying to set-up a few playing fields for my agents to play in at the same time (to speed up training), but in the Python API, realized that the agent_ids are not sequentially initialized with the playing field. Therefore, I am unable to assign which agents are working in the same field (i.e. the agents id are jumbled up). I require this information as I am trying to implement MultiAgent RL. There are 2 methods I can think of. Firstly, is to set different team ids, such that they can be differentiated. Secondly, I could hard code the variable agent_ids in each DecisionSteps.
Therefore, my question is, if I set-up the different playing fields with different team (i.e. field 1 has teams 1 and 2, field 2 has teams 3 and 4 etc), will the training occur such that the behaviors are consolidated? I am asking because I realized that doing this creates more brains and thus I am not sure if their training will be consolidated.
Also, is it possible to hard code the agent_ids, as I believe this is done in Unity rather than the Python API. If it is possible, how do i access this initialization?
Any help will be appreciated. Thank You!
Sorry if I misunderstood.
All agents with the same Behavior Name all share their learning. With teams the active one learns while the others / ghosts use old snapshots. Is there a reason you need the agent_id to match the field they are on. Learning should be independent of the field IIUC.
What I am attempting to do is PPO in a multi-agent setting. Therefore, I am trying to feed joint observations (concatenating the observations of a few agents) into the critic network, while the actor network only takes in local observations. I need their agent_ids to properly segregate which agents will share their observations.
One way I was thinking of segregating the agents is by having different teams on different playing fields, but I am not sure if this will work. (e.g. If I have 4 teams and 2 playing fields, team 1 and team 3 will share the same behavior, team 2 and team 4 will share the same behavior.)
Thanks for your interest and the additional context. We don’t actively support the joint-observation / multi-agent use case at the moment. There has been interest but no plan at the moment to support it. You can try treating each of these “teams” as one agent with multiple sensors but in general we don’t have true multi-agent support and don’t have one official way to proceed.
I am adding an internal tracking number so we can gauge long-term interest.
Multi-Agent Support: MLA-1447
Alright, I understand. Thank you for the clarification.