I spent quite time trying to understand exactly how the life-cycles of the Unity Environment are intertwined with the life cycle of TensorFlow, but I cannot find this piece of evidence in the documentation. For teaching purposes, I would like to create a schema similar to this one:
albeit, a correct one!
Any help or reference would be greatly appreciated, thanks!
1 Like
Hi,
There is no documentation on this topic unfortunately.
This is what happens (simplified) in the UnityEnvironment :
- During each fixed update, the Academy (the orchestrator) calls CollectObservations on all Agents that requested a decision since the last fixed update (Note that RequestDecision can be called anytime from anywhere on the Agent).
- Once all relevant observations are collected, the data is communicated to Python. The data includes all observations and all rewards that were collected since last fixed update (note that AddReward can be called from anywhere anytime on the Agent). It also includes some other useful information such as wether or not the Agent terminated since the last fixed update.
- Python selects an action and sends it to the UnityEnvironment.
- The Academy calls OnActionReceived on all of the Agents that requested a decision since the last fixed update.
- The rest of the fixed update and other game loop unfolds.
It is important to note that RequestDecision and AddReward can be called anytime in the game loop. We recommend doing that in the OnActionReceived method because it is easier for the Agents to learn when the rewards are direct consequences of their actions.
The hook into fixed update is in this script : ml-agents/com.unity.ml-agents/Runtime/Academy.cs at release_7 · Unity-Technologies/ml-agents · GitHub Note that it is possible to disable the automatic stepping in the fixed update and manually call Academy.Instance.EnvironmentStep(); to trigger a RL loop.
1 Like