I’m trying to have my agent determine some coordinates based on an object’s position, but I keep having issues.
In my environment I have a ball that gets thrown at a random angle in the general direction of the agent. Both ball and agent are reset at a random position (within some X and Y constraints) ad the beginning of each episode.
I set up some values to determine a rectangle at a fixed position relative to the agent. Since the agent’s Z position is fixed, all that’s required for the rectangle are X and Y coordinates.
The agent’s task is to determine which X and Y coordinates to move towards in order for the ball’s X and Y coordinates to always fall within the rectangle (which moves together with the agent).
I’m giving the agent a small positive reward (+0.1) every time it guesses the right position, and a large negative reward (-10) whenever it guesses wrong. On a wrong guess, the episode also ends.
So far I can’t get the agent trained properly. Every time, regardless of how I change the hyperparameters and NN structure, its mean reward values never go above -9.945 (approximately).
Here’s my code for the rewards. Am I doing something blatantly wrong in here?
public GameObject ball;
public GameObject agent;
private Vector2 bounds;
private Vector2 estimatedTarget;
public override void CollectObservations(VectorSensor sensor) {
sensor.AddObservation(new Vector2(ball.transform.localPosition.x, ball.transform.localPosition.y));
}
public override void OnActionReceived(ActionBuffers actions) {
float estimateActionX = actions.ContinuousActions[0];
float estimateActionY = actions.ContinuousActions[1];
//Local coordinates of rectangle
estimatedTarget = new Vector2(estimateActionX, estimateActionY);
Vector2 lowBounds = new Vector2(estimatedTarget.x + bounds.x, estimatedTarget.y - bounds.y);
Vector2 highBounds = new Vector2(estimatedTarget.x + 2f * bounds.x, estimatedTarget.y + bounds.y);
//Reward based on correct estimation of target
if (IsInRange(ball.transform.localPosition, lowBounds, highBounds)) { AddReward(0.1f); }
else {
AddReward(-10f);
EndEpisode();
}
//Episode ends when the ball gets behind the agent
if (ball.transform.localPosition.z <= agent.transform.localPosition.z) { EndEpisode(); }
}
public bool IsInRange(Vector3 ball, Vector2 min, Vector2 max) {
return ball.x >= min.x && ball.x <= max.x && ball.y >= min.y && ball.y <= max.y;
}