Hi,
It is recommended that you set your rewards in OnActionRecieved in order to ensure that your rewards are associated with the correct Observation/Action pair. Otherwise, the rewards might not be associated with the observations/actions you think.
When OnActionRecieved() is called (formerly AgentAction() ), that means that outputs have been generated from the neural network on that tick, right?
If so, does that mean rewards defined on a given tick decision tick actually associate themselves with last-decision-tick’s action/state pair?
If not, then wouldn’t this depend on whether the physics update has occurred before or after the AddReward() happening in that same tick?
What I’m guessing is that unless the AddReward() happens before a decision is requested (and therefore before any outputs have a chance to change) on a decision tick, it will just get applied to next-decision-tick’s action-state pair.
I have often wondered about the underlying rules of where rewards can or should be set…
Yes the reward data is sent along with the observations for a particular step.
As long as you are adding your rewards in the OnActionReceived method, you won’t need to worry about the physics system updates. Currenlty, the Agent methods are triggered by hooking into the FixedUpdate loop, but it subject to change between major version updates.