Agents not resetting at the same time

I was creating a competitive shooting game where 2 agents need to shoot each other. So what the problem is that, when one agent health drops below 0, i send the EndEpisode() signal to both of them, and the agent who died resets its position as expected, but not the one that killed it.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.Mathematics;

public class NormalAgent : Agent
{
    public float speed = 3f;
    public float rotationSpeed = 3f;
    public int damagePerShot = 34;
    public Transform shootingPoint;
    public float timeBetweenShots = 0.3f; // Need considering

    public int range = 100;
    public int startingHealth = 100;
    int currentHealth = 100;
    Vector3 startingPosition;
    Rigidbody playerRigidbody;
    Vector3 movement;
    BoxCollider boxCollider;
    float timer;
    float efxDisplayTime = 0.2f;
    Ray shootRay;
    RaycastHit shootHit;
    int shootableMask;
    // int floorMask; No need for agents
    LineRenderer gunLine;
    EnvironmentParameters ResetPara;
  
    /* All engine functions here */
    public override void Initialize(){
        playerRigidbody = GetComponent<Rigidbody>();
        startingPosition = transform.position;
        shootableMask = LayerMask.GetMask("Shootable");
        gunLine = GetComponentInChildren<LineRenderer>();
        boxCollider = GetComponent<BoxCollider>();
        ResetPara = Academy.Instance.EnvironmentParameters;

    }

    public override void OnEpisodeBegin()
    {
        playerRigidbody.position = startingPosition;
        Reset();
    }

    public override void OnActionReceived(float[] vectorAction){
        if(Mathf.FloorToInt(vectorAction[0]) >= 1 && timer >= timeBetweenShots)
            Shoot();
        move_v(vectorAction[1], vectorAction[2]);
        turning(vectorAction[3]);
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(timer);
        sensor.AddObservation(currentHealth);
    }

    public override void Heuristic(float[] actionsOut)
    {
        actionsOut[0] = (Input.GetKey(KeyCode.Space) && timer >= timeBetweenShots) ? 1f : 0f;
        actionsOut[1] = Input.GetAxis("Horizontal");
        actionsOut[2] = Input.GetAxis("Vertical");
        actionsOut[3] = 0f;
        if(Input.GetKey(KeyCode.Q))
            actionsOut[3] = -1f;
        if(Input.GetKey(KeyCode.E))
            actionsOut[3] = 1f;
    }
    void FixedUpdate(){
        timer += Time.deltaTime;
        if(timer >= timeBetweenShots * efxDisplayTime)
            DisableEffects();
    }
...

    void Shoot(){
        ...

        if(Physics.Raycast(shootRay, out shootHit, range, shootableMask)){
            NormalAgent normalAgent = shootHit.collider.GetComponent<NormalAgent>();
            if(normalAgent != null){
                Debug.Log("Hit!");
                AddReward(0.33f);
                normalAgent.TakeDamage(damagePerShot, this);
            }
            else{
                Debug.Log("Missed!");
            }
          
            ...
    }

    void DisableEffects(){
        ...
    }

    public void TakeDamage(int damage, NormalAgent NA){
        Debug.Log(this + " Take " + damage + " damage from: " + NA);
        AddReward(-0.33f);
        currentHealth -= damage;
        if(currentHealth <= 0){
            RegisterDeath();
            NA.RegisterKill();
        }
    }

    void RegisterDeath(){
        AddReward(-1f);
        EndEpisode();
        Debug.Log(this + " Died!");
    }

    void RegisterKill(){
        AddReward(1f);
        EndEpisode();
        Debug.Log(this + " Killed one!");
    }

    void Reset(){
        currentHealth = Mathf.FloorToInt(ResetPara.GetWithDefault("Health", 100f));
        timer = ResetPara.GetWithDefault("Timer", 0f);
    }
}

Here’s the code. When i test the code, both RegisterDeath() and RegisterKill() function has been called by the right agent, so i assume, each agent has called EndEpisod(), but why dont they reset their position?

Im new to ml agents, there may have other bugs in the code, please feel free to point them out! THX. Any advice would help a lot!

Im using Unity 2019.4.9f ml-agents ver 0.21.1

Oh! i figure it out myself! its the coding error! I should use rigidbody.transform.position not rigidbody.position, but how did the died one actually reset successfully?

Sorry, it’s hard to tell from looking at your code what’s going on. I’d recommend using the debugger to step through the code, or add some extra logging statements to OnEpisodeBegin() and Reset().

In general, with multiple agents like this, I think it’s easier to have a separate class that ends the episodes of the agents, instead of letting one agent end the epsiode of the other one.