I basically have an agent that does:
public override void OnActionReceived(ActionBuffers actionBuffers)
{
if (Mathf.RoundToInt(actionBuffers.ContinuousActions[0]) < 0)
SetReward(1f);
else
SetReward(-1f);
EndEpisode();
}
I would assume this would learn to pick floats that round to -1. Instead it seems to do the opposite…
Can someone explain what’s wrong?