Reseting env from mlagents every time episode ends without using curriculum learning

Hello,

I am trying to re-instantiate the Unity environment from mlagents after every episode ends. Currently I am doing this in a very hacky way in the environment’s step function like so:

        if self.get_step_result(group_name).done[0]:
            self.reset()

Inside the reset function I have placed some code to re-instantiate the unity environment through the side channel. The problem is the tensorboard for cumulative reward and episode length are no longer updated. It also breaks during training sometimes for different reasons. I have attached below the reset function as well.

Where would be the best place to have this side channel communication happening while preserving normal training functioning?

def reset(self, arenas_configurations: ArenaConfig = None) -> None:

ac = ArenaConfig(rc(self.counter)) # Random env creator
arenas_configurations_proto = ac.to_proto()
arenas_configurations_proto_string = arenas_configurations_proto.SerializeToString(
deterministic=True
)
self.arenas_parameters_side_channel.send_raw_data(
bytearray(arenas_configurations_proto_string)
)
try:
super().reset()
except UnityTimeOutException as timeoutException:
if self.play:
pass
else:
raise timeoutException

I’ll fire this over to the team for some guidance!

1 Like

Hi,
It sounds like you’re using the mlagents-learn command and modifying the code that it runs, but I’m not quite sure. If that’s the case, you should make sure super.step() is being called from your step() implementation; if not, then I don’t think the trainer will know about the end of the episode, which would explain why cumulative reward and episode length aren’t updating in tensorboard.

I think the side channel information there will be sent properly. The message you queue there should get gathered here

which gets called during UnityEnvironment.reset():

Hi and thanks for your help!

I should have given more detail.

I am using mlagents-envs==0.15.0

I am directly overriding the step function in mlagents_envs/environment.py and simply adding that last line so there should be no need to call super.step() again. The side channel information is indeed being sent properly, given that the environment is correctly reset appropriately. I’m just not understanding why the tensorboard stats aren’t updating and I get a diverse set of errors after a few million steps. I’ll try and rerun to see if I can share one of the error tracebacks.

Here is the overrided step function which calls the reset function I shared in the first post. The last two lines are the only code I added. Note that this will only work for one arena per environment.

    @timed
    def step(self) -> None:
        if self._is_first_message:
            return self.reset()
        if not self._loaded:
            raise UnityEnvironmentException("No Unity environment is loaded.")
        # fill the blanks for missing actions
        for group_name in self._env_specs:
            if group_name not in self._env_actions:
                n_agents = 0
                if group_name in self._env_state:
                    n_agents = self._env_state[group_name].n_agents()
                self._env_actions[group_name] = self._env_specs[
                    group_name
                ].create_empty_action(n_agents)
        step_input = self._generate_step_input(self._env_actions)
        with hierarchical_timer("communicator.exchange"):
            outputs = self.communicator.exchange(step_input)
        if outputs is None:
            raise UnityCommunicationException("Communicator has stopped.")
        self._update_group_specs(outputs)
        rl_output = outputs.rl_output
        self._update_state(rl_output)
        self._env_actions.clear()
        if self.get_step_result(group_name).done[0]:
            self.reset()