Agent Perception in 2D

Hi guys,

I’m training an agent who is represented by a triangle in the below image. His purpose is to dodge the black circles and to stay alive for as long as possible. The agent can shoot and it can move by rotating itself on the z axis and accelerating. The circles are spawned outside an area where the agent is and move through that area at random directions and velocities.

Yesterday I finally managed to get it working! The agent was routinely achieving very high episode lengths.
In the end, what worked were these observations to know his own position and velocity and a RayPerceptionSensor2D to know the circles’ positions.

sensor.AddObservation(localVelocity.x);
sensor.AddObservation(localVelocity.y);
sensor.AddObservation(transform.localPosition.x / (MapSetup.dimensions.x / 2));
sensor.AddObservation(transform.localPosition.y / (MapSetup.dimensions.y / 2));
sensor.AddObservation(transform.rotation.z);```

However I think I that the RayPerceptionSensor2D is not the correct tool for this job. For example, what if there is a circle (enemy), behind another circle? As in this image. I think that in this case, the agent sees only 1 circle on his left and another on his right, instead of the 3 circles that actually exist, so he would shoot the first one, and then rotate to the other side, leaving him exposed to circle nr 2. Because of this, I think the agent cannot form optimal strategy.

![](https://i.imgur.com/B6ooiYP.png) 

Also, when a circle is detected with a RayCast, there is a SphereCast, if I understand correcly, that means that the agent knows a vector towards the center of the sphere. So for example in the below situation, it would be missing its shots.

![](https://i.imgur.com/wI0kn5P.png) 

I have also tried a CameraSensor, it looks like the agent learns to dodge, somewhat, but shooting completely fails. I wasn't able to get a solution with that. This is my least preferred method as well, because if I understand correctly I have to retrain the model if I change sprites of the game objects.

I also tried to pool all the circles, as in object pooling, and add their positions, velocities and if they're active in the scene or not to the observations. I have tried with as little as 6 circles, which would put the total observations at somewhere around ~40-50, can't remember. But it didn't work at all. Agent would learn to dodge very quickly, probably less than 100 000 steps, but it wouldn't aim at all and even dodging would start deteriorating eventually.

``` foreach (GameObject asteroid in map.asteroids.Values)
{
sensor.AddObservation(asteroid.activeSelf);
Rigidbody2D arBody = asteroid.GetComponent<Rigidbody2D>();
Vector2 alocalVelocity = arBody.transform.InverseTransformDirection(arBody.velocity);
sensor.AddObservation(alocalVelocity.x);
sensor.AddObservation(alocalVelocity.y);
sensor.AddObservation(asteroid.transform.localPosition.x / (MapSetup.dimensions.x / 2));
sensor.AddObservation(asteroid.transform.localPosition.y / (MapSetup.dimensions.y / 2));
sensor.AddObservation(asteroid.transform.localScale);
}```

I've also though about splitting the agent into 2 tasks, one for movement, and the other for shooting, but since both of these tasks require z axis rotation, I think that they would be at odds with each other.

Can anyone recommend a better approach for vision? And can anyone comment if it's even plausible to make a perfect AI for this task, or should I just stick with my suboptimal solution I have right now?

Hello,

I think the “which is better” part of your question is somewhat conceptual. It depends on what you are trying to achieve by training your agent. If your primary goal is to solve the problem and you don’t mind letting AI know stuff which it might not supposed to know of, I’ve had success with the pooling method you described. This method can be augmented to make it somewhat more realistic, eg only observe objects within sphere cast and in direct visibility of the agent. This method seems very promising to me in general, however, it is quite hard to eyeball the information which is required to make an informed decision. One of the things which I see missing in the pooling version (but I am completely unsure if it is required) is rotation. Also, it might be more helpful to send the asteroid’s position relative to the agent.

On the other hand, designing sensors is quite tiring and the problem you are describing should be solvable by a standard raycast sensor. To avoid the need for velocities you can either enable memory in hyperparameters (makes the training longer) or enable observation stacking (sends last n observations as one).

One of the other experimental ways I could suggest trying is to use a grid sensor (look in the repo).

If your primary concern is that the agent is not using a gun, then I would suggest investigating actions and rewards setup. I would assume that learning to avoid asteroids is a bit simpler than learning to shoot them. In this case, I would try and rebalance the reward to make movement more costly than rotation and shooting.

Thank you for your help. I don’t mind giving the AI more information. Asteroid rotation is not required, they do not rotate. I’ve also tried adding the asteroids’ positions relative to the agent, the training didn’t work, but maybe that’s a fault of something else, because none of them really worked as expected with the pooling method. I think I had implemented it incorrectly.

I have tried giving the position of the asteroid relative to the agent. I was thinking, if I give the position of the asteroid in relation to the agent, this Vector2, is a direction, which the agent can multiply by -1, to get the vector which it should rotate towards and then fire a projectile. But if it’s not multiplied by -1, then it is the distance vector between an asteroid and the agent. Is this possible for the agent to figure out? Or maybe I should it as 2 parameters? The vector so the agent knows the distance and the vector * -1, so it knows what direction to rotate to and fire a projectile?

Anyway, this would also be imprecise, because asteroids can have velocity, and without observation stacking the agent cannot compensate for it. I can add the velocity to the observations as well, or add observation stacking, but with what I have tried that only raises the number of observations.

I’m also passing velocity, and position of the agent, because If I pass just the position and the rotation with observation stacking, then it’s 32=6 observations, and the agent can know x,y,x velocity, y vecocity, z rotation and angular velocity.
But if I give 5
1=5 observations as I do now, I’m just losing the angular velocity, which I don’t think matters that much. And maybe it’s easier for the agent to understand? Because it doesn’t have to find a relation between 2 numbers, it can just look at the passed values.

I will continue to experiment with the rewards, and I’ll take a look at the grid sensor too!

How did you setup the observations for objects within the agent’s sphere cast with pooling? What I was doing is instantiating a group of objects at the start, selfActive == false, with a large Z position (my game is 2D, so this just puts them out of the way), then when it is time to spawn them, I would select a selfActive == false gameObject from the dictionary they were stored in and then move it to the scene, when the object is used up it is set inactive again and put at a large Z position. Do I have the right idea?

If you want to train better, you have to normalize the observation,
Normalization is a crucial technique in machine learning, especially when working with environments in Unity ML-Agents. Normalization helps in scaling down the observations to a smaller and consistent range, typically between -1 and 1 or 0 and 1.

here is the example code:

Vector2 localVelocity = rBody.transform.InverseTransformDirection(rBody.velocity);
        sensor.AddObservation(localVelocity.x);
        sensor.AddObservation(localVelocity.y);
        // Assuming rBody.velocity is a Vector3, normalize and add x, y components as observations
        sensor.AddObservation(localVelocity.normalized.x);
        sensor.AddObservation(localVelocity.normalized.y);
        // Correct way to add rotation as observation
        sensor.AddObservation(transform.localEulerAngles.x / 360.0f);
        sensor.AddObservation(transform.localEulerAngles.y / 360.0f);
        sensor.AddObservation(transform.localPosition.x / (MapSetup.dimensions.x / 2));
        sensor.AddObservation(transform.localPosition.y / (MapSetup.dimensions.y / 2));
        sensor.AddObservation(transform.localRotation.normalized);