Training several MLAgents brains (RL) on the same static training environment (eg: terrain)

Hi there!

I'm wondering if there's a specific reason the Unity samples and usage guidelines suggest to duplicate the whole training "ecosystem" (eg. terrain and props) to train multiple agents (reinforcement learning) at the same time.

Feels a bit like... overloading the scene with loads of colliders and renderers, by duplicating the same, static, non-changing environment.

Just for fun (or visual demonstration), I'd love to get the typical "all agents in a generation spreading from a single point" seen in a lot of ML videos not related to Unity.

Maybe I'm missing something?
Premising that no interaction can alter the "environment" an agent can move in (eg: static terrain, racing track),
wouldn't ignoring or disabling collisions among agents be better than duplicating complex environments?

I'm still tinkering with Barracuda and MLAgents (and ML in general), so I may be missing something.

Part 2:
I'm also thinking about resetting agents at some distance so that they can eventually collide and try to avoid themselves. Doing all the training in a single environment (without the need to make the environment+agents a prototype)

Is there any ready to use solution to handle some corner case scenario I may be ignoring? Have you already done that? Is this something so trivial that no one actually thinks about it, or is it notoriously not possible?

Thanks for your time.

1 Like

in theory it's doable but totally unnecessary :) the training segment of the experience happens outside of Unity so any processing power spent on things like, fancy props, aesthetics, game feel, is just using processing power needlessly since training is just to produce the behaviours for your agents. Plus you ideally duplicate your environment many times in one scene to simultaneously train your agents. In practice it's probably negligible if you have a halfway good computer, unless your scene is REALLY impressive :) but for the actual USAGE of the behaviours in your game end-product those should definitely look nice! You can use cubes to train a smart enemy behaviour in a simple recreation of your environment, and then put that behaviour on fully 3d monsters or whatnot in your actual game.

for your part 2 resetting agents at some distance sounds like it could be done with the normal reward/penalty system. If two agents touch each other give them a penalty of -1 and move them apart. For ready to use solutions, you may be on your own, but i encourage you to play and experiment as ml is a lot of fun :)