Train Machine Learn Agent to Drive the Standard Asset Car from Unity

Hi all,

After looking deep into the Internet my team and I are creating this threat to find some help and resources.
We are creating a procedural maze with a start position, and a random-ish end position. Our idea is to make a
Machine Learn agent drives the car provided by the Standard Asset from Unity but we are not having any success with it. We managed to make a simpler agent run through the maze and found the end, but for some reason, the car gets keeps running into walls. We have tried different hyperparameters and observations, we also tried P.P.O., S.A.C. and even using Immitation.

If someone has any advice or resource I would appreciate any help.
below is the agent code → CarAgent.cs

public override void CollectObservations(VectorSensor sensor)
    {

        sensor.AddObservation(transform.localPosition); // 3
        sensor.AddObservation(this.transform.forward); // 3
        sensor.AddObservation(this.transform.InverseTransformPoint(target.transform.position)); // 3
        sensor.AddObservation(this.transform.InverseTransformVector(carBody.velocity)); // 3
        sensor.AddObservation(m_Car.CurrentSteerAngle); // 1
        sensor.AddObservation(m_Car.CurrentSpeed); // 1
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        if (completedRace) return;
      
        MoveAgent(actions);
      
        if (transform.localPosition.y < -0.5f)
        {
            StopCar();
            transform.localPosition = raceManager.startPoint;
            EndEpisode();
        }

        if (carBody.transform.up.y < 0.75f)
        {
            StopCar();
            EndEpisode();
        }

        if (StepCount == MaxStep)
        {
            StopCar();
            EndEpisode();
        }

        AddReward(-1f / MaxStep);

    }
  
    private void OnTriggerEnter(Collider other)
    {
        if (other.CompareTag("Target"))
        {
            //StartCoroutine(Finished());
            StopCar();
            completedRace = true;
            AddReward(1f)
        }
    }

    private void OnCollisionEnter(Collision other)
    {
        if (other.gameObject.CompareTag("Wall"))
        {
            AddReward(-0.05f);
        }
    }

    private void OnCollisionStay(Collision other)
    {
        if (other.gameObject.CompareTag("Wall"))
        {
            AddReward(-1f / MaxStep);
        }
    }

And this is the last behavior parameter we tried → CarAgent.yaml

behaviors:
  CarAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 64
      buffer_size: 10240
      learning_rate: 1e-3
      beta: 1e-2
      epsilon: 0.15
      lambd: 0.93
      num_epoch: 8
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      curiosity:
        strength: 0.05
        gamma: 0.99
        encoding_size: 256
        learning_rate: 3e-4
    keep_checkpoints: 5
    max_steps: 1e8
    time_horizon: 128
    summary_freq: 10000
    threaded: true

Thanks.

1 Like

Hi @DiogoQueiroz ,
Have you tried using The Raycast sensor to detect the walls? We had helped an internal team train carts to drive before and they trained relatively quickly and learned to avoid the walls.

your per step reward penalty may be incentivizing the agent to “kill itself” more quickly to avoid getting a lower reward.

could you also post your MoveAgent method?

Hi @christophergoy , I believe we are using Raycast to detect walls, we might not be using it correctly because this studies are new for us. Below is the snapshot, we can also see in this snapshot that Raycast can detect the finish point as well.
6944933--816596--upload_2021-3-17_11-7-20.png

And how we could make the car avoid killing itself and go directly to the finish point?
Below is the code for the MoveAgent

private void MoveAgent(ActionBuffers actionBuffers)
    {
        // var discreteActions = actionBuffers.DiscreteActions;
        // float accel = 0;
        // float steer = 0;
        //
        // var action = discreteActions[0];
        // switch (action)
        // {
        //     case 1:
        //         accel = 1f;
        //         break;
        //     case 2:
        //         accel = -1f;
        //         break;
        //     case 3:
        //         steer = 1f;
        //         break;
        //     case 4:
        //         steer = -1f;
        //         break;
        // }

        //var continuousActions = actionBuffers.ContinuousActions;
        //var accel = Mathf.Clamp(continuousActions[0], -1f, 1f);
        //var steer = Mathf.Clamp(continuousActions[1], -1f, 1f);

        float forwardAmout = 0f;
        float turnAmout = 0f;

        switch (actionBuffers.DiscreteActions[0])
        {
            case 0:
                forwardAmout = 0f;
                break;
            case 1:
                forwardAmout = +1f;
                break;
            case 2:
                forwardAmout = -1f;
                break;
        }
      
        switch (actionBuffers.DiscreteActions[1])
        {
            case 0:
                turnAmout = 0f;
                break;
            case 1:
                turnAmout = +1f;
                break;
            case 2:
                turnAmout = -1f;
                break;
        }
      
        m_Car.Move(turnAmout, forwardAmout, forwardAmout, 0f);

    }

Hi @christophergoy ,

I’m on @DiogoQueiroz team, and just to complement his answer and give you some more info.
We are trying to train it increasing the complexity little by little. The training starts in a small and empty area like the one in Diogo’s snapshot and after some 30 episodes, it increases the complexity a little bit, after the complexity is at the maximum, we increase the maze size and start this process again. Should we try a different flow?

Also, we are currently using the decision requester period of 3. As the agent moves fast, we tried to use it with a lower value (1), but then the agent basically doesn’t move away from the start. What we should look at to find the best value for this?

I’m running a training for almost 10mi steps and the agent keeps being stuck in the wall like this…6945095--816671--upload_2021-3-17_11-52-33.png

Thanks for all of the info @DiogoQueiroz and @casbas ,
Just to clarify, do the walls and the goal target have different tags that are detectable by the raycasts? I could imagine a situation where it thinks the goals and the walls are the same if they aren’t differentiated. It may see the wall and think it’s headed toward the goal.

This sounds reasonable to me, there is a property workflow for this called curriculum learning within ML-Agents that you could use. It allows you to pass different environment parameters to the Unity Environment from python based on how well the agent is doing in the current Curriculum.

3 sounds reasonable, you could try to bump it up to 5 to see if you get better results.

Hi @christophergoy , we have different tags shown below. The walls are only one mesh with a mesh collider, this could be an issue with the detection?
6945296--816707--upload_2021-3-17_12-36-24.jpg

But if we increase this time for the decision requester this means that the agent will take longer to make a decision?

Yes, it means that every 5 steps, the agent will make a decision.

For the kart game we worked with, the ray casts were spread all around the vehicle. I’m not sure if your car can back up or not, but it doesn’t seem to have any raycast vision behind the front bumper which may make it think it can just back up and turn a certain way when in fact, it can not.

We do have more ray casts, I just selected one by mistake.
6945320--816710--upload_2021-3-17_12-44-7.jpg

blaringcostlykillifish

So, this is a shot gif from our car moving through the maze. Below I’m showing the graph from TensorBoard.



I’m not sure if those graphs are looking good or the values are going the way they should. Any insight on it?
Thanks for all the help.

I’m working on something similar and found a few things useful:

This may seem obvious and you probably have done this: play using heuristic mode and log all rewards, go around with the car and test all possible cases to make sure rewards/penalties are being sent to the agent with the values you’d expect.

Curiosity + Penalties can cause a survivorship bias (mentioned above). I found it worth while to go as bare bones as possible with network/RL alg parameters and aim for the most simple version of your goal (ie. no curiosity and the most simple version of the task).

Training only on the first part of the curriculum (empty area w/ walls) and getting a stable model that doesnt run into walls can be used in initialize-from for the next step (pretraining). This can be a sanity check that things are coded properly. If your car is still running into walls after training in the open area, something is wrong with your perception.

You can consider using GAIL and or Behavioral Cloning to jump start your learning a little bit via demonstrations. This page in the mlagents docs is very informative. If you do this, you will most likely have to create a more sparse reward system.

I have to give huge props to mbaske - his videos on youtube and his repos are great for learning from. His grid sensor example has a self driving car that you may be able to pull inspiration from.

My naieve guess is that either your perception is messed up (tags/layers) or your reward presentation isn’t representing the concept of your goal to the agent.

Hey, @WaxyMcRivers thanks for the hints. We are still trying to find a good way to train it.
At the moment I’m trying to get a stable model as you said, in just an empty area.

This youtube channel has a lot of good stuff, I hope it will help. Thanks!