Hey there!
I’m working on a demo to see how RL can help casual games studios to make NPC using RL (instead of Behavior Trees or FSM etc). So I modified the Tanks game made by Unity in 2015.
It’s a multi-agent environment, where you have 2 tanks that need to kill its opponent. I use the PPO (proximal policy optimization) architecture.
After some very good advice and feedback from this thread, I’ve updated my environment.
So, I’m using the GridSensor provided by Eidos, in the MLAgents.Extension package, I read the documentation (thanks again Luke-Houlihan), this is an amazing sensor system but there are things that I don’t understand in the documentation.
The tank observation looks like this:
- A GridSensor (that detects tags, shell height position, and enemy health).
- A vector that contains the localRotation of the turret (in order to help the agent to shoot in the correct direction).
- A vector that contains its own health.
- A bool indicates that the agent can shoot or not (to avoid spamming like hell).
What I’ve done
I’ve overridden the GetObjectData method of the GridSensor, since, by default, only the tag information is used. I need, in addition to the tag info, the enemy health and the shell height position (to know if the bullet is about to touch the floor (close to 0) and hence explode).
protected override float[] GetObjectData(GameObject currentColliderGo,
float typeIndex, float normalized_distance)
{
float[] channelValues = new float[ChannelDepth.Length]; // ChannelDepth.Length = 4 in this example
channelValues[0] = typeIndex;
Rigidbody goRb = currentColliderGo.GetComponent<Rigidbody>();
if (goRb != null)
{
Debug.Log(goRb.gameObject.name);
channelValues[1] = goRb.position.y;
Debug.Log("channelvalues[1]" + channelValues[1]);
// This is to avoid errors (since values can't be negative)
if (channelValues[1] < 0f)
{
channelValues[1] = 0.0f;
}
if (channelValues[1] > 3f)
{
channelValues[1] = 3.0f;
}
// Get the ennemy tanks' health
if (goRb.gameObject.layer == 9)
{
channelValues[2] = goRb.gameObject.GetComponent<TankHealth>().m_NormalizedCurrentHealth;
}
}
return channelValues;
}
}
My Questions related to GridSensor
-
Is there is an official implementation example of the modified GridSensor explained in the documentation (the example with Health and enemy)? Because, as you can see above, my version is quite dirty and I want to learn the best practices with GridSensor. They provide a good example in the GridSensor code but it’s just to detect the position of a rigidbody.
-
In the documentation, on Channel Based section, they said:
To distinguish between categorical and continuous data, one would use the ChannelDepth array to signify the ranges that the values in the channelValues array could take. If one sets ChannelDepth to be 1, it is assumed that the value of channelValues is already normalized. Else ChannelDepth represents the total number of possible values that channelValues can take.
→ Does it means that if we have a range of values (for instance for health: 0 - 100) we need to define 100 to channel depth element and it will be automatically normalized by the GridSensor. Or we need to normalize by ourselves?
- In the documentation on Channel Hot section, they said:
ChannelDepth = {3, 5}
Like in the previous example, the “enemy” in the example is encoded as [0, 0, 1].
For the “health” however, the 5 signifies that the health should be represented by a OneHot encoding of 5 possible values, and in this case that encoding isround(.6*5) = round(3) = 3 => [0, 0, 0, 1, 0].
→ What I understand is that detected tags are automatically encoded into a one-hot array.
But not continuous values (such as health).
So what I don’t understand is that if we define that we want a one-hot array of 5 for health (= define 5 to this channel depth element), do we need to normalize ourselves the continuous value if we want that the GridSensor will automatically one-hot encode this value? (aka for instance if health = 90, we normalize it ourselves so health = 0.9 and then the GridSensor will transform it to [0,0,0,0,1])?
-
Do you think that for my Tank environment, it’s better to use Channel Based or One Hot version of the Grid Depth type? And do you have tips on how to choose?
-
Is there is a way to see the whole output of the GridSensor, I mean before it is transformed to a PNG[ ] in order to debug?
Again, thanks for your help,