Racing Simulator ML-agents

Hi, I’m having trouble setting up the OnActionReceived() function for mlagents, I’m using the Realistic Car Controller V3 from the asset store. I created a race track and everything works perfectly, other than random behaviour from the car agent. Can anyone please give me some insight on how I should do this? All help is much appreciated.

public override void OnActionReceived(float[] vectorAction)
    {

        controller.gasInput = Mathf.Clamp(vectorAction[0], 0, 1f);
        controller.brakeInput = Mathf.Clamp(vectorAction[1], 0, 1f);
        controller.steerInput = Mathf.Clamp(vectorAction[2], -1f ,1f);       
    }

Gas input is for accelerating, values are 0-1 in the controller script.
Brake input is for braking, values are 0-1 in the controller script.
Steer input is for steering, values are -1 to 1 in the controller script.

These are normally managed with the GetAxis horizontal and vertical, as seen in the heuristic method:

   public override void Heuristic(float[] actionsOut)
    {
        actionsOut[0] = 0;
        actionsOut[1] = 0;
        actionsOut[2] = 0;
        actionsOut[3] = 0;

        if (Input.GetAxis("Vertical") == 1)
        {
            //Accelerating
            actionsOut[0] = 1;
        }
        else if(Input.GetAxis("Vertical") == -1)
        {
            //Braking
            actionsOut[1] = 1;
        }
        else if (Input.GetAxis("Horizontal") == 1)
        {
            //Steer Right
            actionsOut[2] = 1;
        }
        else if(Input.GetAxis("Horizontal") == -1)
        {
            //Steer Left
            actionsOut[3] = 1;
        }
    }

I will accept all criticism as I’m very new in ml-agents and thank you for all comments.

Initially, the agent will behave randomly in order to ‘explore’ the state and action space. Over time, the behavior should converge to something that seems ‘intentional’, given that you’ve formulated your reward function and observation space reasonably. This can take a long time depending on your problem. I would let it run for 5M timesteps and monitor your training on tensorboard to see if your reward is increasing properly. ml-agents/docs/Using-Tensorboard.md at main · Unity-Technologies/ml-agents · GitHub

Additionally, it looks like your heuristic is using 4 actions whereas your OnActionReceived uses 3. I believe the 3 actions for steering/gas/brake makes sense.

not sure if that can help, but the continuous action space, outputs values between -1 and 1.
Clamping the values like you did, means that you are ignoring half of the vectorAction 0 and 1.

i think a better approach is to clamp the raw output values between -1 and 1 (it’s done automatically, but as suggested by the ML team, better do it a second time), then remap the values to the desired range.

public override void OnActionReceived(float[] vectorAction)
    {
        controller.gasInput = Mathf.Clamp(vectorAction[0], -1f, 1f);
        controller.brakeInput = Mathf.Clamp(vectorAction[1], -1f, 1f);
        controller.steerInput = Mathf.Clamp(vectorAction[2], -1f ,1f);

        controller.gasInput = Map(controller.gasInput, -1, -1, 0, 1);
        controller.brakeInput = Map(controller.brakeInput, -1, -1, 0, 1);
    }

      //1st range is the original one, 2nd is the desired range 
    public float Map(float value, float low1, float high1, float low2, float high2){
        float mappedValue = low2 + (value - low1) * (high2 - low2) / (high1 - low1);

        if(value < low1 || value > high1 || mappedValue < low2 || mappedValue > high2){
            Debug.Log("Warning, outputs out of range!!!");
        }
        return mappedValue;
    }

that way, a gasInput value of -0.2, doesn’t get ignored, but treated like a +0.4 output