Food Collector Example Environment

In the food collector environment, multiple agents are interacting at once. During training, are all agents running the same policy? So basically, it is self-play without randomly substituting older policies?

They are using same brain which means same policy as you said. But I dont think this count as self-play. They basically learn concurrently in a Multiagent RL environment.