Train agent to command non-consistent group of units

As an experiment, I decided to train ML Agent to be used as an opponent in my prototype RTS game.

As planned, the agent must control a group of units of various classes (warrior, archer, etc.).

I understand that I have to run each training session with a different number and composition of units.

To select a unit to give an order to, I use something like the following code:

 public override void OnActionReceived(float[] vectorAction)
    {
        // Process actions
        int selectedUnitIndex = vectorAction[0];
        Vector3 targetPosition = new Vector3(vectorAction[1], vectorAction[2], vectorAction[3]);

        Unit selectedUnit = GetUnitAtIndex(selectedUnitIndex); // gettinf the unit based on the index
        selectedUnit.MoveTo(targetPosition); // Example command
    }

Is this the right approach? What confuses me is that even if the unit properties are passed to observation, the agent will not be able to “understand” the unit with what properties it controls at the moment, since the list of units will be different each time during training.

Just an idea,
1 vector to represent the “unit type” (warrior, archer, caster, etc.) Keep an array of Unit Types.
1 vector to represent the “unit action” (move, shoot, cast spell, dance, etc…) Keep an enum/array of ALL possible actions for array of Unit Types.
3 vector to represent the “target position”

Then on action, give small reward if the “selected action” can be performed by the “selected unit type” or maybe a small penalty if it can’t. So all 3 could move, but only 0 & 1 units can shoot, only 3 can cast spell. (This look up matrix would be the pain in the butt, or at least a large case statement)

Then, if it is one of the valid combinations of unit & action, call your actual code to perform the action and pass in the 3 target position numbers. Reward/Penalty/Nothing could then be applied in the “action performing code” again to signify success in the action; the spell hits/misses, can/cannot move to new position and such.

Anyhow, if I were gonna give it a go, that is what I would try.

1 Like