About MA-POCA training agents, the results is so bad and even they don't learn at all

Hi, I’m a beginner on ml-agents.Recently, I tried to train three agents to round a target(stay)up in a certain angle. I control them with force and torque. And the results is so bad. I don’t know what problem results in it.
My setting as follow:
1.observation:
ray 3D component and observation size 14,such as:
var m_velocity=transform.InverseTransformDirection(m_AgentRb.velocity);
sensor.AddObservation(new Vector2(m_velocity.x, m_velocity.z));//2
var anglSpeed = transform.InverseTransformDirection(m_AgentRb.angularVelocity);
sensor.AddObservation(anglSpeed.y);//1
distance between agent and target//1
sensor.AddObservation(forwardForce / m_AgentRb.mass);//3
sensor.AddObservation(rotationTorque);//3
the average of three agents’ distance between …//1
The angle enclosed by adjacent agents and the target//2
var angleBetween = Vector2.Dot(transform.InverseTransformDirection(transform.forward),
transform.InverseTransformDirection(new Vector2(tarToAgent.x, tarToAgent.z)));
sensor.AddObservation(angleBetween);//1

  1. action:
var forwardGo = Vector3.zero;
        var rotationGo = Vector3.zero;
     

        var continueAction = actionBuffers.ContinuousActions;
        var a1 = Mathf.Clamp(continueAction[0], 0, 1);
        var a2 = Mathf.Clamp(continueAction[1], -1, 1);
      
      forwardGo = transform.InverseTransformVector(transform.forward)*a1;
       
       rotationGo = transform.InverseTransformVector(transform.up) * a2;
       
        forwardForce = forwardGo * m_Setting.agentSpeed;
     
         rotationTorque = rotationGo * m_Setting.agentAngularSpeed;
      
        m_AgentRb.AddRelativeForce(forwardForce);
      
        m_AgentRb.AddRelativeTorque(rotationTorque );
  1. reward:
    single agent:distance,angle
    group:achieve task, collision, out of boundary, time penalty
    4.yaml
My Behavior:
    trainer_type: poca
    hyperparameters:
      batch_size: 512
      buffer_size: 10240
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
        network_settings:
             normalize: true
    keep_checkpoints: 5
    max_steps: 50000000
    time_horizon: 64
    summary_freq: 30000

and when i training,over a period of time, there will be a error and at the same time, it occurred five times :

ArgumentException: NaN increment passed to AddReward.
Unity.MLAgents.Utilities.DebugCheckNanAndInfinity (System.Single value, System.String valueCategory, System.String caller) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Utilities.cs:58)
Unity.MLAgents.Agent.AddReward (System.Single increment) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Agent.cs:729)
PurseAgent.MoveAgent (Unity.MLAgents.Actuators.ActionBuffers actionBuffers) (at Assets/Cooperation/Scripts/PurseAgent.cs:290)
PurseAgent.OnActionReceived (Unity.MLAgents.Actuators.ActionBuffers actions) (at Assets/Cooperation/Scripts/PurseAgent.cs:241)
Unity.MLAgents.Actuators.VectorActuator.OnActionReceived (Unity.MLAgents.Actuators.ActionBuffers actionBuffers) (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Actuators/VectorActuator.cs:76)
Unity.MLAgents.Actuators.ActuatorManager.ExecuteActions () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Actuators/ActuatorManager.cs:295)
Unity.MLAgents.Agent.AgentStep () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Agent.cs:1344)
Unity.MLAgents.Academy.EnvironmentStep () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Academy.cs:589)
Unity.MLAgents.AcademyFixedUpdateStepper.FixedUpdate () (at D:/ml-agents-release_17/com.unity.ml-agents/Runtime/Academy.cs:43)