Gym Unity - Baselines

Hello guys,

I finished my environment in unity and now I am trying to “export it to gym” to try different algorithms (i will do my own implementations afterwards). I am trying Baselines now and I exported the environment as:

env = UnityToGymWrapper(unity_env, uint8_visual=True, flatten_branched=True, allow_multiple_obs=True)

And now, from this line:

model = PPO(MlpPolicy, env, verbose=0)

I am getting the error:

NotImplementedError: Tuple(Box(-inf, inf, (91,), float32)) observation space is not supported

What could I do? I am a bit lost.

PPO baselines does not support observations of type Tuple(Box(-inf, inf, (91,), float32)) (which I think corresponds to flat vector observations of 91 floats). If you want to use baselines, you need to create an environment with observations and actions that baselines can work with.

I am using raycasts and one boolean (so yes vector obs as you mention). How can i know what kind of observations does baselines work with? do i check the algorithm - which obs input it supports or do i try to change my obs? i just need some direction.

I have not worked with PPO baselines in a while, I think you will have better luck looking at their documentation or issues page. If my memory is correct, it should work on single visual observations but I really am not sure.

Hi, i have similar problem with stable_baselines3. How did you solved it?

@simmax21 I had to make a custom environment with the help of @aakarshanc01
If you can further improve on this code would also be amazing for me and other people that come after us:

def get_wandb_ue_env():
    # engine config
    engine_channel = EngineConfigurationChannel()
    engine_channel.set_configuration_parameters(time_scale=config.time_scale)
    # side channels
    channel = SB3StatsRecorder()
    # environment
    env = UE(config.env_path,
             seed=1,
             worker_id=rank,
             base_port=5000 + rank,
             no_graphics=config.no_graphics,
             side_channels=[engine_channel, channel])

    return env


class CustomEnv(gym.Env):
    def __init__(self):
        super(CustomEnv, self).__init__()

        env = get_wandb_ue_env()
        env = UnityToGymWrapper(env, allow_multiple_obs=True)

        self.env = env
        self.action_space = self.env.action_space
        self.action_size = self.env.action_size
        self.observation_space = gym.spaces.Dict({
            0: gym.spaces.Box(low=0, high=1, shape=(27, 60, 3)),  # =(40, 90, 3)),
            1: gym.spaces.Box(low=0, high=1, shape=(20, 40, 1)),  # (56, 121, 1
            2: gym.spaces.Box(low='-inf', high='inf', shape=(400,))
        })

    @staticmethod
    def tuple_to_dict(s):
        obs = {
            0: s[0],
            1: s[1],
            2: s[2]
        }
        return obs

    def reset(self):
        #         print("LOG: returning reset" + self.tuple_to_dict(self.env.reset()))
        #         print("LOG: returning reset" + (self.env.reset()))
        #          np.array(self._observation)
        return self.tuple_to_dict(self.env.reset())

    def step(self, action):
        s, r, d, info = self.env.step(action)
        return self.tuple_to_dict(s), float(r), d, info

    def close(self):
        self.env.close()
        global rank
        rank -= 1

    def render(self, mode="human"):
        self.env.render()

class SB3StatsRecorder(SideChannel):
    """
    Side channel that receives (string, float) pairs from the environment, so that they can eventually
    be passed to a StatsReporter.
    """

    def __init__(self) -> None:
        # >>> uuid.uuid5(uuid.NAMESPACE_URL, "com.unity.ml-agents/StatsSideChannel")
        # UUID('a1d8f7b7-cec8-50f9-b78b-d3e165a78520')
        super().__init__(uuid.UUID("a1d8f7b7-cec8-50f9-b78b-d3e165a78520"))
        pretty_print("Initializing SB3StatsRecorder", Colors.FAIL)
        self.stats: EnvironmentStats = defaultdict(list)
        self.i = 0
        self.wandb_tables: dict = {}

    def on_message_received(self, msg: IncomingMessage) -> None:
        """
        Receive the message from the environment, and save it for later retrieval.

        :param msg:
        :return:
        """
        key = msg.read_string()
        val = msg.read_float32()
        agg_type = StatsAggregationMethod(msg.read_int32())

        self.stats[key].append((val, agg_type))

        # assign different Drone[id] to each subprocess within this wandb run
        key = key.split("/")[1]
        self.i += 1

        if env_callback is not None and wandb_run_identifier == "test":  # and "Speed" in "val"
            # if self.i % 100 == 0:

            my_table_id: str = "Performance[{}]".format(wandb_run_identifier)

            # pretty_print("Publishing Table: key: {}, val: {}".format(my_table_id, key, val), Colors.FAIL)

            env_callback(my_table_id, key, val)
               
    def get_and_reset_stats(self) -> EnvironmentStats:
        """
        Returns the current stats, and resets the internal storage of the stats.

        :return:
        """
        s = self.stats
        self.stats = defaultdict(list)
        return s
1 Like

i then register this environment through the gym registration method and call it everywhere else as gym.make(“my_id”). Since the environment pulls from the config file it can always adapt to different builds and dont need any more code to register “new” builds.

also something to take into account is the SubProcVecEnv is a bit unstable at least for me, you pass no context from any previous variables into the subprocesses so the training has to be fully separated and then brought back, you might choose a different strategy for that. i decided to reduce myself to 1 trainer instead of a vectorized env for now and just train for ~20 hours.

@simmax21 @vincentpierre if you can help me with this issue I would really appreciate it too UnityGymWrapper Crash after 2M iterations - Unity Forum

@ademord thank you for the custom env
just a little change. Changing the dict key from int to str make it works.

Edit:
This was the error
raise TypeError(“module name should be a string. Got {}”.format(
TypeError: module name should be a string. Got int

@ademord sorry, but do you know how to pass Academy.Instance.StatsRecorder data to python?

I want to log all the data that is recorded by Academy.Instance.StatsRecorder from my unity-converted-to-gym env