Need help with turn-based ML Agents player setup

I am trying to make a turn-based game using ML Agents. I decided to start with something simple so I am trying to make “Connect Four”. At the moment I am merely trying to get a “Player” brain to work properly and need some help.

Based on the documentation, it looks like I should be able to use “On Demand Decision Making” for event based input. I have created a component that watches for any of the valid Inputs (the same keys as used for “Discrete Player Actions”), and then calls “RequestDecision” on whichever agent should be taking a turn.

What I expected to happen:
The documentation mentions that “RequestDecision” starts an “observation-decision-action-reward cycle”, so I expected them to act accordingly and in this order:

  • observation → “CollectObservations” should be invoked, I also wish to use “SetActionMask” here
  • decision → determined by the keyboard input
  • action → “AgentAction” should be invoked

What actually happened:

  • “AgentAction” is invoked BEFORE “CollectObservations”.
  • The move decision ALWAYS picks action index ‘0’ first, for both players, regardless of my input.
  • The move I actually selected seems to be queued and used on subsequent actions, but not always.
  • I can’t figure out if I should use some variant of Input’s “GetKeyDown”, “GetKey”, or “GetKeyUp” as an event, because none of them seem to be consistently tied to the “Discrete Player Action’s” inputs.

Is anyone else experiencing similar problems? How am I supposed to have two competing player agents in the scene and make sure that only one of them responds to input at a time, and that the move they pick is actually the one that keyboard input specified?

Environment:
I am using Unity Version 2018.3.0f2 Personal
I have updated to the current ML-Agent master commit (0.6.0)

I also posted an issue on the ml agents repository.

There I mention a workaround solution to this issue via a heuristic brain in case it helps anyone.