I have a conceptual question. When training an agent to find a target, but there’s obstacles in the way – in my case, it’s a rigidbody helicopter in a city of skycrapers, trying to find a random transform – how can I properly reward & punish the agent so that it’s not punished for taking justified detours to get around a skycraper?
My current approach is to reward as the agent gets closer to the target, but I see the challenge of it then constantly bumping against skyscrapers as it found a local maximum of sorts:
if (isCloseToTarget && isSlow)
{
SetReward(1f);
Done();
}
else if (previousDistanceToTarget != null)
{
if (distanceToTarget < previousDistanceToTarget)
{
SetReward(0.01f);
}
else if (distanceToTarget > previousDistanceToTarget)
{
SetReward(-0.01f);
}
else
{
const float punishmentPerTimeWasted = -0.001f;
SetReward(punishmentPerTimeWasted);
}
}
I suppose another approach would be to not reward for getting closer or further at all (and just reward once on win), but I understand this can make training much longer.
For penalties, you could use collisions or raycasts or both. If you have raycast detection for your observations anyway, then you can set a proximity threshold. If hit.distance goes below that value, meaning the agent flies to close too an obstacle without hitting it, you can penalize inversely proportional to that distance.
My agent managed to fly some minor detours through the city grid. But struggeled with larger ones. I guess it comes down to the target angle when using vector dot product. Flying orthogonal to the target direction doesn’t yield any positive rewards in this case.
Thanks! Penalizing for obstacles makes sense. Maybe for starters I can collect isColliding as new observed bool signal, and then penalize for colliding – after all that would also damage a real helicopter so it makes double sense.