How do we reset the environment during training?
Looking at these docs and elsewhere I can see how to implement OnEpisodeBegin() on my agents and how to call EndEpisode() for each of them as each finishes, and I can see how to subscribe to the Academy.Instance.OnEnvironmentReset event so that I can reset the overall environment when it actually does reset, however I cannot see how to cause the environment to reset.
My current use case is that I have up to several agents on a “team”. When all on a team have died, then the game ends and I need to reset the environment. As each of them individually dies, then his own EndEpisode() is called and that’s fine. But once they have all died, I need to reset everything (including non-agent objects in the environment to put things back to how they should be at the start of a game).
It looks like Academy used to have something like a Done() method on it. I’m using 0.15.1 and don’t see anything like that.
So how does one reset the environment now?
Thanks.
Upon further inspection of the examples and docs, it looks like maybe ml-agents simply isn’t equipped for this?
To simplify the description of the problem: if I have several agents in a scene learning at the same time, but I only want to reset the environment when all of their episodes have ended, can I do that?
The environment in question is large and expensive (a fully populated terrain) - it doesn’t seem practical for me to have several instances of this in a scene (like the examples do with lots of tiny tennis boards, for example). On top of that, I actually do want the agents to learn how to cooperate eventually, so I don’t want them isolated anyway (at least in the long run).
So is this idea of only resetting the environment when all agents are finished supported?
Thanks!
Hi,
The environment will not reset when all the agents are Done. Just like when making a game, it is the responsibility of the environment to keep track of the “players”. If all the players are dead, the game should restart on its own. Note that this can be done by calling “Academy.Instance.OnEnvironmentReset.Invoke()” directly.
Academy.Instance.OnEnvironmentReset will be called by Python when using certain features that require the whole environment to reset (curriculum learning for example). It is also called when using the UnityEnvironment.reset method on Python (if you are using our environment API directly or the gym wrapper).
The Academy.Instance.OnEnvironmentReset event is a tool that allows Python to restart the game when it wants. But it will not be called automatically when all the agents have ended their episodes.
When I restart the game, though, do I need to call something on the agents to “reset” their learning? Otherwise, they are learning that by all dying together they can cause a reset which might have reward benefits in some scenarios right?
(Or conversely, to STOP the learning of the dead ones until they are all dead.)
For example, if each agent has BeginEpisode() called right after he dies, how do I make them wait until the entire team is dead before they start learning again? Obviously I can control my game logic itself to make them do nothing until all of the agents have reset, but then by forcing them to do nothing (ignoring input), they are learning that their attempts to provide input values are not doing anything for some period of time, right?
Ah, this sounds like a great use case for NOT reseting agents but instead destroying them and re-spawning them.
Instead of calling “EndEpisode” when the agent dies, simply Destroy the Agent.
Destroy(AgentGameObject) will automatically tell Python that the Agent terminated the task (either in success or in failure) and the Agent will not reset.
When reseting the environment, destroy the remaining agents (if any) and create new Agents.
If destroying Agents is too costly, Disabling and re-Enabling Agents should do the trick as well:
Would that solve your issue?
1 Like
Isn’t OnEnvironmentReset an event, meaning it is not possible to Invoke() it from anywhere but the Academy class? Also, I can not find EndEpisode() as a call-able method in the agent class. Did something change? Edit: Is EndEpisode() called Done() in some versions?
Yes, Vincent, that sounds like the perfect solution. I didn’t know that destroying them did this. Sounds great so I will give it a try. Thank you for your support.
EndEpisode is definitely there in the latest release (0.15.1).
I have the same configuration as Claytonious where i have 2 teams of agents (sharing the same brain) battling each other, resetting when one team gets killed
. I have tried calling Academy.Instance.OnEnvironmentReset.Invoke()
but without success as i suppose we cannot call this event outside Academy class. Also, i have tried the solution given by vincentpierre. Destroying all agents then instantiating them again did not call my custom reseting method EnvironmentReset()
added to the academy event like so:
Academy.Instance.OnEnvironmentReset += EnvironmentReset;
I am running with the new release 1.0. Can anyone enlight me on what could have gone wrong with my logic? Thanks
Hi,
I think your logic is correct, but the goal of OnEnvironmentReset is to reset the simulation or game from Python.
I think what you should do is have a method to reset the environment (called EnvironmentReset() like in your example)
When the game needs to reset (because too many agents died for instance) call EnvironmentReset() manually.
This is regular simulation behavior, it resets on its own when the conditions are met.
IN ADDITION, you should use Academy.Instance.OnEnvironmentReset += EnvironmentReset
so Python can reset the environment without having to wait for the conditions of a reset to be met.
1 Like
Hi,
I have a similar dilemma. In my case when all the agents reach the goal, I want to change the game object that the agent was controlling. Academy.Instance.OnEnvironmentReset += EnvironmentReset seemed to be called only one time at the beginning of the training but not at the start of each new episode. The solution was to maintain a list and remove from the first and add to the list the last agent resetting from the call onepisodend on each agent.