[SOLVED] Adapting BufferSensor to single observations for inference

Hello,

We are writing a master thesis where we are adapting the dodgeball-ml-agents environment to apply neuroevolution with the python-NEAT package. We are only using the elimination game mode and adapted it to 1v1 instead of 4v4.

The original environment: GitHub - Unity-Technologies/ml-agents-dodgeball-env: Showcase environment for ML-Agents
Our current adaption: GitHub - Hallahallan/Dodgeball-Bio-fMRI

I am new to Unity and have trouble figuring out how BufferSensors work in practice. The environment was built for attention networks and collects observations into the buffer as i believe. Meanwhile for neuroevolution i am only interested in the current observation for agent behavior by inference.

Currently the observations are severely padded by zeros which i have not made sense of why yet. I checked that it is still the case with the original environment. By counting the consequent floats i have guessed that:

Observation 1 - Seems to only report values in capture the flag, so i assume its for flag position.
Observation 2 - AgentRayCast (41 casts)
Observation 3 or 5 - Wall or Ball RayCast (21 casts)
Observation 4 - Unknown
Observation 6 - Back Raycast (3 casts)

Some seemingly random points in the observations is also filled by 1´s, as observation 5 here for example:

  1. Does anyone have experience with this environment and either knows if there exists documentation on the observation space or has figured out what observation maps to which sensor?

  2. How can i either limit the buffer sensors for one observation only or replace it with a fixed non-buffer sensor?

I appreciate all the help i can get!

1 Like

Note that there are two similar but different concepts:

  • BufferSensor, which stacks up multiple observations from the same timestep, e.g. if there are an unknown number of balls or agents, those can be stacked up
  • Stacked Vector, where we stack up observations over multiple time steps

The latter is controlled from the Behavior settings, and is set to 1 in dodgeball, so there is no stacking over multiple time steps:

The code that gets the observations is here: ml-agents-dodgeball-env/Assets/Dodgeball/Scripts/DodgeBallAgent.cs at b2915bd442f88bef391d8c380227f5be65bfae60 · Unity-Technologies/ml-agents-dodgeball-env · GitHub You can see that it is stacking up the observations for each agent. But if there is a fixed number of agents, as in your case, there will always be the same stack size. You can likely customize this code to return the observation of your choice.

Note that if Use Vector Obs is True, which it is by default:

… then a bunch of additional data is added to the observations, ml-agents-dodgeball-env/Assets/Dodgeball/Scripts/DodgeBallAgent.cs at b2915bd442f88bef391d8c380227f5be65bfae60 · Unity-Technologies/ml-agents-dodgeball-env · GitHub

if (UseVectorObs)
{
sensor.AddObservation(ThrowController.coolDownWait); //Held DBs Normalized
sensor.AddObservation(Stunned);
Array.Clear(ballOneHot, 0, 5);
ballOneHot[currentNumberOfBalls] = 1f;
sensor.AddObservation(ballOneHot); //Held DBs Normalized
sensor.AddObservation((float)HitPointsRemaining / (float)NumberOfTimesPlayerCanBeHit); //Remaining Hit Points Normalized
sensor.AddObservation(Vector3.Dot(AgentRb.velocity, AgentRb.transform.forward));
sensor.AddObservation(Vector3.Dot(AgentRb.velocity, AgentRb.transform.right));
sensor.AddObservation(transform.InverseTransformDirection(m_HomeDirection));
sensor.AddObservation(m_DashCoolDownReady); // Remaining cooldown, capped at 1
// Location to base
sensor.AddObservation(GetRelativeCoordinates(m_HomeBasePosition));
sensor.AddObservation(HasEnemyFlag);
}
2 Likes

Thank you for the quick reply!
Now it makes sense that the observation stack size seems fixed.

Use vector observations seems to add 4 new values to observation 6 which makes sense as a lot of them are 0 when nothing is happening as per the code posted above.

When observations are added to the sensor, does the order added in Unity scripts denote the ordering upon fetching the observations in the python MLAgents library? How can i pinpoint which observation in Python is linked to which order in Unity?

8752834--1186021--upload_2023-1-24_11-33-38.png

Is Max Num Observables the max amount of observation stacks from the same timestamp?
What is observation size? This is for the other agent sensor, which has 41 rays, so i am not sure where 8 comes from.

When i lowered the stacked RayCasts to 1 the observations kind of shuffled themselves in order in python, and now i mostly only have 5 extra rows of zeros on the RayCast obs output (and some 1´s). Do you have an idea of where the rest could come from?

8752834--1186021--upload_2023-1-24_11-33-38.png

Okay i just understood the format of the RayCast data, where the columns with 0 and 1`s are for the booleans that recognizes the different tags for each ray. I assume reducing the stacked RayCast to 1 was okay as it removed rows without information.

I have not figured out the order of the RayCast observations yet, but it may not be necessary. I suppose as long as the data is fed into the model in the same order/format each time, it will facilitate learning.

When observations are added to the sensor, does the order added in Unity scripts denote the ordering upon fetching the observations in the python MLAgents library?

Yes

I have not figured out the order of the RayCast observations yet, but it may not be necessary. I suppose as long as the data is fed into the model in the same order/format each time, it will facilitate learning.

As long as your network is permutation-invariant, i.e. fully-connected / Linear, but not e.g. convolutional, then order does not matter, right.

When i lowered the stacked RayCasts to 1

Oh… so stacked RayCasts stacks over time, to show movement. If you only want to get the current time step, then you do want to set these all to 1.

Is Max Num Observables the max amount of observation stacks from the same timestamp?

Yes, see Class BufferSensorComponent | ML Agents | 2.2.1-exp.1

What is observation size? This is for the other agent sensor, which has 41 rays, so i am not sure where 8 comes from.

It’s the number of floats per observation. See Class BufferSensorComponent | ML Agents | 2.2.1-exp.1

You can see the data stored from each other agent here ml-agents-dodgeball-env/Assets/Dodgeball/Scripts/DodgeBallAgent.cs at b2915bd442f88bef391d8c380227f5be65bfae60 · Unity-Technologies/ml-agents-dodgeball-env · GitHub

i.e.:

    private float[] GetOtherAgentData(DodgeBallGameController.PlayerInfo info)
    {
        var otherAgentdata = new float[8];
        otherAgentdata[0] = (float)info.Agent.HitPointsRemaining / (float)NumberOfTimesPlayerCanBeHit;
        var relativePosition = transform.InverseTransformPoint(info.Agent.transform.position);
        otherAgentdata[1] = relativePosition.x / m_LocationNormalizationFactor;
        otherAgentdata[2] = relativePosition.z / m_LocationNormalizationFactor;
        otherAgentdata[3] = info.TeamID == teamID ? 0.0f : 1.0f;
        otherAgentdata[4] = info.Agent.HasEnemyFlag ? 1.0f : 0.0f;
        otherAgentdata[5] = info.Agent.Stunned ? 1.0f : 0.0f;
        var relativeVelocity = transform.InverseTransformDirection(info.Agent.AgentRb.velocity);
        otherAgentdata[6] = relativeVelocity.x / 30.0f;
        otherAgentdata[7] = relativeVelocity.z / 30.0f;
        return otherAgentdata;
    }

Do you have an idea of where the rest could come from?

There are 4 separate raycast sensors inside each agent. If you take into account all of these, does this explain all of the zeros you are seeing?

1 Like

I figured out that the extra zeros in the data from each RayCast was for booleans of RayCasts that hit labeled objects. I was a little confused on how it worked but now i understand it clearly. The ObservationSize and MaxObservations made me realize it denoted the OtherAgents data and not the AgentRayCast.

Thank you a lot, and i appreciate you taking time out of your day to help me get a grasp on this project!

1 Like

I just noticed that there are one set of zeros more than necessary. So for example, the WallRayCast has two detectable tags. The resulting array from the sensor contains three “columns” of zeros and a column for the distance the individual RayCast reached.

Why is there three columns of zeros when only two are needed to both tags per RayCast? I have yet to observe the last column to contain anything else than zero and it seems like NEAT have not used a comparable amount of nodes (no connection from those inputs).

I could simply modify the observations and remove the unnecessary columns, but i wish to know why this column is there in the first place and if there is a way to remove it from the unity side?.

For anyone wondering this in the future. I figured out that the last boolean represented whether or not the RayCast reached its full length without hitting any object.