I have a question about the AddReward and SetReward functions.
Based on my understanding of the documentation, AddReward is an accumulated sum of rewards throughout the episode, while SetReward replaces the accumulated rewards for a fixed value:
I am developing a simulator where the agent’s reward is based on the distance to the target: the farther away, the greater the negative reward, and the closer, the smaller this value, approaching 0.
However, in my simulation, I am not observing this behavior:
My Code:
public override void OnActionReceived(ActionBuffers actionBuffers)
{
var act = actionBuffers.ContinuousActions;
Vector3 move = Vector3.zero;
move.x = act[0];
move.y = act[1];
move.z = act[2];
agent_rb.AddForce(move * forceMove);
Vector3 positionToTarget = target - transform.localPosition;
// Adiciona penalidade por estar longe do objetivo a cada passo
AddReward(-(positionToTarget.magnitude));
// Quando o agente atinge o objetivo
if (positionToTarget.magnitude <= 0.5f)
{
print("Chegou");
SetReward(0); // Defina a recompensa acumulada como 0
EndEpisode();
}
// Se o agente cair fora do limite
if (transform.position.y < 0f)
{
EndEpisode();
}
}
Rewards in Tensorboard:
We can see that the reward drops because I am using Curriculum learning to increase the distance from the target, and yet the reward is not 0 even though the agent reaches the goal.
Thank you.
