Hi, I’m a beginner on ml-agents.Recently, I tried to train three agents to round a target(stay)up in a certain angle. I control them with force and torque. And the results is so bad. I don’t know what problem results in it.
My setting as follow:
1.observation:
ray 3D component and observation size 14,such as:
var m_velocity=transform.InverseTransformDirection(m_AgentRb.velocity);
sensor.AddObservation(new Vector2(m_velocity.x, m_velocity.z));//2
var anglSpeed = transform.InverseTransformDirection(m_AgentRb.angularVelocity);
sensor.AddObservation(anglSpeed.y);//1
distance between agent and target//1
sensor.AddObservation(forwardForce / m_AgentRb.mass);//3
sensor.AddObservation(rotationTorque);//3
the average of three agents’ distance between …//1
The angle enclosed by adjacent agents and the target//2
var angleBetween = Vector2.Dot(transform.InverseTransformDirection(transform.forward),
transform.InverseTransformDirection(new Vector2(tarToAgent.x, tarToAgent.z)));
sensor.AddObservation(angleBetween);//1
- action:
var forwardGo = Vector3.zero;
var rotationGo = Vector3.zero;
var continueAction = actionBuffers.ContinuousActions;
var a1 = Mathf.Clamp(continueAction[0], 0, 1);
var a2 = Mathf.Clamp(continueAction[1], -1, 1);
forwardGo = transform.InverseTransformVector(transform.forward)*a1;
rotationGo = transform.InverseTransformVector(transform.up) * a2;
forwardForce = forwardGo * m_Setting.agentSpeed;
rotationTorque = rotationGo * m_Setting.agentAngularSpeed;
m_AgentRb.AddRelativeForce(forwardForce);
m_AgentRb.AddRelativeTorque(rotationTorque );
- reward:
single agent:distance,angle
group:achieve task, collision, out of boundary, time penalty
4.yaml
My Behavior:
trainer_type: poca
hyperparameters:
batch_size: 512
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 512
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: true
keep_checkpoints: 5
max_steps: 50000000
time_horizon: 64
summary_freq: 30000
and when i training,over a period of time, there will be a error and at the same time, it occurred five times :
ArgumentException: NaN increment passed to AddReward.
Unity.MLAgents.Utilities.DebugCheckNanAndInfinity (System.Single value, System.String valueCategory, System.String caller) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Utilities.cs:58)
Unity.MLAgents.Agent.AddReward (System.Single increment) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Agent.cs:729)
PurseAgent.MoveAgent (Unity.MLAgents.Actuators.ActionBuffers actionBuffers) (at Assets/Cooperation/Scripts/PurseAgent.cs:290)
PurseAgent.OnActionReceived (Unity.MLAgents.Actuators.ActionBuffers actions) (at Assets/Cooperation/Scripts/PurseAgent.cs:241)
Unity.MLAgents.Actuators.VectorActuator.OnActionReceived (Unity.MLAgents.Actuators.ActionBuffers actionBuffers) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs:76)
Unity.MLAgents.Actuators.ActuatorManager.ExecuteActions () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Actuators/ActuatorManager.cs:295)
Unity.MLAgents.Agent.AgentStep () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Agent.cs:1344)
Unity.MLAgents.Academy.EnvironmentStep () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Academy.cs:589)
Unity.MLAgents.AcademyFixedUpdateStepper.FixedUpdate () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Academy.cs:43)