How to control the car using heuristic option in ML-Agents

Hello Folks,
I hope everyone is doing fine. I am new to unity and new to C-sharp coding. I am trying to move the car which is available in the standard asset through the CarAgent Script. The goal is to do reinforcement learning and the car can accelerate, brake or do nothing (3 actions) in order to maintain a distance from the target object let’s say another car (which is moving with certain speed with respect to time). For reference I am attaching my C-sharp scripts.
One thing I would like to mention is I can control the car if I attach carusercontroll script. Instead of using CarAgent Script (Heuristic option)… I don’t know what i am missing here… something related to void fixedupdate and void awake is missing i guess but i am not sure

I would highly appreciate if someone can help me in this regard.

6603886–751234–CarController.cs (13.6 KB)
2271884–153185–CarUserControl.cs (905 Bytes)
6603886–751231–CarAgent.cs (4 KB)

Being a standard asset i doubt there’s an error in the code.
Maybe the problem is in the agent setup.

the Heuristic Function is called inside the RequestDecision() function.
Attach a Decision Requester script to the agent. The script will call the function every x time automatically.
The function will gather observations for the inputs, but without a neural net to process them, it will ask the user to provide outputs through the heuristic function.

1 Like

Many Thanks m4l4 for your input.

So what I did is, I already put the decision requester attached to my car but i think what i was missing is decision period which i set to 1 which take action after every one second and check mark “take actions between decisions”
In behaviour parameters option i changed the space size to 3 (which means i have 3 actions accelerate, brake and do nothing)

For future reference if someone is having same problem
the updated CarAgent Script is as follows:

using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Policies;
using UnityStandardAssets.CrossPlatformInput;

namespace UnityStandardAssets.Vehicles.Car
{
[RequireComponent(typeof(CarController))]
public class CarAgent : Agent
{
private Vector3 originalPosition;

private Vector3 Targetorginalposition;

private BehaviorParameters behaviorParameters;

private CarController carController;

private Rigidbody rbody;

public Transform Target;

public override void Initialize()
{
originalPosition = this.transform.localPosition;
Targetorginalposition = Target.localPosition;

// here write the orginal position of target

behaviorParameters = GetComponent();
carController = GetComponent();
rbody = carController.GetComponent();

Reset();

}

public override void OnEpisodeBegin()
{
Reset();
}

private void Reset()
{
this.transform.localPosition = originalPosition;
Target.localPosition = Targetorginalposition;
// here write the code for the position of the target

}

public override void CollectObservations(VectorSensor sensor)
{
sensor.AddObservation(Target.localPosition);
sensor.AddObservation(this.transform.localPosition);
}

public override void OnActionReceived(float[ ] vectorAction)
{
var direction = Mathf.FloorToInt(vectorAction[0]);

switch (direction)
{
case 0: // do nothing so basically means idle
break;

case 1: // move forward
carController.Move(0f, 1f, 0f, 0f);
break;

case 2: // Move backwards
carController.Move(0f, 0f, -1f, 0f);
break;

}

float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);

if (distanceToTarget == 8.0f)
{
SetReward(1.0f);
}

else if (distanceToTarget >= 7.5f || distanceToTarget <= 7.9f) // yahn per koi error a sakta hai.
{
SetReward(0.5f);
}

else if (distanceToTarget < 7.5f)
{
EndEpisode(); // could be Reset
}

else if (Target.localPosition.z == 1990.486f)
{
EndEpisode();
}
// AddReward(-1f/ MaxStep);
}

public override void Heuristic(float[ ] actionsOut)
{
actionsOut[0] = 0;
if (Input.GetKey(KeyCode.UpArrow))
{
actionsOut[0] = 1;

}
else if (Input.GetKey(KeyCode.DownArrow))
{
actionsOut[0] = 2;
// carController.Move(0f, 0f, -1f, 0);
}

1 Like

looking at the code, seems you are using discrete action space.
For a car control problem, a continuous action space might be more appropriate.

Think about it that way:
With discrete actions, you are choosing to either press the pedal or don’t. but if you do, you go full throttle only.
there’s no middle ground since you are not choosing HOW MUCH you want to accelerate.
same goes with steer and brake.

with continuous control, the agent will output floats between -1 and 1, you can then multiply the values for your maxSpeed, maxSteer variables, to get the desired acceleration or steer angle.

1 Like

Once Again thanks m4l4, that is a very good advice. I modified the code accordingly

Changes in code made:
public override void OnActionReceived(ActionBuffers actions)
{
var accelerate = Mathf.Clamp(actions.ContinuousActions[0], 0f,1f); // values ranges from 0 to 1 for throttle
var brake = Mathf.Clamp(actions.ContinuousActions[1], -1f, 0f); // values ranges from -1 to 0 for brake
if (accelerate >= 0 || brake <=0) // should also work without if statement
{
carController.Move(0f, accelerate, brake, 0f);
}
}

you are welcome, i’ve worked on a similar project myself and i remember the headache.

avoid, clamping the outputs like that.
clamp(val, 0, 1) means that everything below 0 will be read as 0. That way you are ignoring half of the output.
instead, remap the values from a (-1, 1) range to a (0, 1) range.

you can first clamp the output (-1, 1) (it does it automatically, but the docs say it’s good practice to do it anyway),
then remap it with:
action[0] = (action[0] + 1) * 0.5;

you’ll get a float between 0 and 1, and no part of the output will be ignored or misinterpreted.

1 Like

yes you are right :slight_smile: Thanks!