DecisionStep vs TerminalStep

Hi all,
I’m noticing that during runtime, some Agent ID information is being passed to both the DecisionStep and TerminalStep.
To be more specific, in Python I can query DecisionStep.agent_id and TerminalStep.agent_id, and find that some cases some Agent IDs are on both lists.

Which is the correct data point for the duplicated Agent ID?
Is this a possible bug?

See image below of an example of the 3DBall with 12 agent IDs.

I notice that the DecisionStep has a 0 reward for Agents 1, 7 and 9 but a different observation vector is seen on both objects. I would assume that the observation on the TerminalStep would be the actual terminal step.

Is my assumption correct?

Thanks!


Hi,

If an AgentId is both in DecisionStep and TerminalStep, it means that the Agent reseted in Unity and immediately requested a decision.
In your example, Agents 1, 7 and 9 had their episode terminated, started a new episode and requested a new decision. All in the same call to env.step()

Hi vincent, thank you SO much for the clarification!!!