Regarding env.get_steps(behavior_name)

Hi,
I am having some troubles understanding how to step the environment in the latest release. I am currently using the 3DBall scenario and I do not really understand what is happening sometimes after calling env.get_steps(behavior_name)

for behavior_name in behavior_names:
    env.set_actions(behavior_name, actions[behavior_name])

env.step()

for behavior_name in behavior_names:
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    print(len(decision_steps), len(terminal_steps))

After a couple of steps, one of the agents in the scene terminates and I do receive a single entry inside terminal_steps but none inside decision_steps.

How am I supposed to get the information for the remaining agents?
Thanks!

1 Like

Hi

In the 3DBall environment, there are 12 agents requesting decisions every 5 fixed updated. If one of the Agents drops the ball between these 5 fixed updates and before all of the others, this Agent will NOT wait for the other Agents to request decisions to signal Python that it dropped the ball. This means that Python will receive 1 terminal_steps and 0 decision_steps (since one agent terminated and the others did not request a decision yet). The information is not lost, in Python, you can look at the relevant data of the terminated agent and then call env.step to move the simulation (until an other agent terminates or requests a decision) and call env.get_steps again to retrieve that new data.

In ML-Agents, data is communicated to Python either when an Agent requests a decision or terminates but not in between. Agents are not required to request decisions or terminate in sync.

Here is an illustration of what is going on in Unity and Python (n/a mean that no message was exchanged at all)

Unity :
agent 1           :   decision |       |         |                    |              |            | termination & decision          |
agent 2           :   decision |       |         |                    |              |            | decision                        |
agent 3           :   decision |       |         | termination        |              |            | decision                        |

Python :
env.get_steps     :   (3, 0)   | n/a   | n/a     |  (0, 1)            | n/a          | n/a        | (3, 1)                          |

This means that in Python, if you call

env.step();
decision, terminal = env.get_steps(behavior_name)
print(len(decision), len(terminal))

3 times, you will see :

(3,0)
(0,1)
(3,1)
1 Like

Thanks a lot, that clarifies a lot.

I have an additional question which probably does not apply to the 3DBall environment, but which may be relevant for me later on.

In the 3DBall scenario, is there a way to receive less than 12 agents (and more than 0, of course) inside decision_steps? What I mean by that is, are there any cases where agents may request decisions at different frequencies? Because, in the examples above, even when an agent terminates, it still resets on time to send information back together with the other agents.

Thanks again for the help

1 Like

You are perfectly right, Agents are not required to request decisions or terminate in sync.
If you have 3 agents with a decision requester set to request a decision every 5 fixed updates and 2 agents with a decision requester set to request a decision every 7 fixed updates, you will receive decision steps as follows:
after 5 steps : 3 agents
after 7 steps : 2 agents
after 10 steps : 3 agents
after 14 steps : 2 agents
after 15 steps : 3 agents
after 20 steps : 3 agents again
after 21 steps : 2 agents
…
after 35 steps : 3 + 2 agents (35 divisible by 5 and 7)

Note that if you do not use a decision requester and manually request decisions with “Agent.RequestDecision” then you can have much more complex scenarios.

1 Like

That clarifies a lot. Thanks for taking the time.

[QUOTE="vincentpierre, post: 6418421, member: 1370256"]
Unity :
agent 1           :   decision |       |         |                    |              |            | termination & decision          |
agent 2           :   decision |       |         |                    |              |            | decision                        |
agent 3           :   decision |       |         | termination        |              |            | decision                        |
Python :
env.get_steps     :   (3, 0)   | n/a   | n/a     |  (0, 1)            | n/a          | n/a        | (3, 1)                          |
[/QUOTE]

Can you explain what does this termination and decision for agent 1 means? I understand that Agents are not required to request decisions or terminate in sync.
that’s why when agent 3 terminated there was no decision request , only a termination (0,1) , but what does (3,1) mean?

(3,1) means that agent1 terminated (and you can find the last obs and final reward inside terminal_steps) and then it managed to reset on time to return a new observation (the first observation for a new episode inside decision_steps).

1 Like

You are exactly right !

Thanks now it’s clear