I have a question about using a poca trainer.

I studied and understood the ppo algorithm, the basic algorithm of ml-agents. However, I am not sure how the poca trainer was implemented in ML-Agents. I have read two papers on multi-agent. Is it right that poca trainer centralize policies and take action separately? I need details. Thank you!

Hi @xogur6889 ,
The POCA algorithm was developed by the ML-Agents research team. They’re working on an arxiv submission, but it’s not ready yet. I’ll see if there are any other details that they can share sooner…

1 Like

I talked to the research folks some more. The high-level explanation is that POCA trains a group of agents to maximize a shared common reward. It also supports the removal/addition of agents during runtime, which I believe is not handled by other multi-agent trainers.

The POCA trainer trains policies in a centralized way, but each policy acts independently during inference.

1 Like

Thank you for your help. I look forward to other details.

I didn’t know about this fact. Thank you, but I wonder how you implemented it. Is it possible?

HI @celion_unity ,
Is there anything new?

The team is still working on a submission to arxiv, and there’s nothing else to share right now. When the arvix paper is available, I’ll update you.

1 Like

Thanks!!! I’ll just wait. hahaha:):):):slight_smile:

Hi! Is there any news already?

hello,about this paper,there is RSA information. But I cant find the setting in …yaml