How to make the agents "really" understand the boolean observation

I have hard time trying to train the agents really understand the binary value. For instance, in the scene, the agents have to collect some items in order to open the gate. An observation of whether the gate is opened is given to the agents. The gate can be seen by the agent through ray casts. I trained them in two scenarios for a long period of time:

  1. The gate is always visible to the agents: the agents attached on the gate very often even though it was not opened, and after a large amount of steps, they eventually left to find the required items which also have positive rewards. But a lot of time, they came back again without collecting all required items and being stuck in front of the gate. Note, the agents did sometimes successfully opened the gate, but they seemed to not learn from the boolean observation.

  2. Change the tag of the gate (like Pyramid example): this worked, as the agents could not see the gate “opened”. If the item are all collected, the tag of the gate changes to “opened”. However, the agents revisited the gate location several time to check if the gate tagged with “opened” was there rather than learning from the boolean observation. This also happens to be the case even in official Pyramid example. In Pyramid, if the agent does not find the ultimate target, it revisits and hits the button again.

The agent’s observations are:

  1. boolean of if item is collected for 5 items (5 observations)
  2. boolean of if the gate is opened (1 observation)
  3. regular ray-casts
    All boolean observations have to be true in order to win the game.

Did anyone successfully train agent to understand such requirement?

This is a challenging type of problem for RL algorithms. We accomplish this in Pyramids using curiosity. You can read about that here https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Reward-Signals.md#curiosity-reward-signal