MLAgents error: SubprocessEnvManager had workers that didn't signal shutdown

Hi, I have an issue, where when I try to train, it simply loads for a couple seconds and then fails.

mlagents 0.28 with MLAgents package 2.2.1-exp1.

See full printout below:

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.28.0,
  ml-agents-envs: 0.28.0,
  Communicator API: 1.5.0,
  PyTorch: 1.7.1+cu110
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 2.2.1-exp.1 and communication version 1.5.0
[INFO] Connected new brain: MoveToGoal?team=0
[WARNING] Behavior name MoveToGoal does not match any behaviors specified in the trainer configuration file. A default configuration will be used.
[WARNING] Deleting TensorBoard data events.out.tfevents.1648051984.DESKTOP-FHQ6GMJ.11680.0 that was left over from a previous run.
[INFO] Hyperparameters for behavior name MoveToGoal:
        trainer_type:   ppo
        hyperparameters:
          batch_size:   1024
          buffer_size:  10240
          learning_rate:        0.0003
          beta: 0.005
          epsilon:      0.2
          lambd:        0.95
          num_epoch:    3
          learning_rate_schedule:       linear
          beta_schedule:        linear
          epsilon_schedule:     linear
        network_settings:
          normalize:    False
          hidden_units: 128
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        checkpoint_interval:    500000
        max_steps:      500000
        time_horizon:   64
        summary_freq:   50000
        threaded:       False
        self_play:      None
        behavioral_cloning:     None
[INFO] Exported results\ppo\MoveToGoal\MoveToGoal-0.onnx
[INFO] Copied results\ppo\MoveToGoal\MoveToGoal-0.onnx to results\ppo\MoveToGoal.onnx.
[ERROR] SubprocessEnvManager had workers that didn't signal shutdown
[ERROR] A SubprocessEnvManager worker did not shut down correctly so it was forcefully terminated.
Traceback (most recent call last):
  File "C:\Users\hjalte\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\hjalte\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers\learn.py", line 260, in main
    run_cli(parse_command_line())
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers\learn.py", line 256, in run_cli
    run_training(run_seed, options, num_areas)
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers\learn.py", line 132, in run_training
    tc.start_learning(env_manager)
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents_envs	imers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers	rainer_controller.py", line 176, in start_learning
    n_steps = self.advance(env_manager)
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents_envs	imers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers	rainer_controller.py", line 234, in advance
    new_step_infos = env_manager.get_steps()
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers\env_manager.py", line 124, in get_steps
    new_step_infos = self._step()
  File "C:\Users\hjalte\Desktop\Szrot\Szrot ML\venv\lib\site-packages\mlagents	rainers\subprocess_env_manager.py", line 417, in _step
    step: EnvironmentResponse = self.step_queue.get_nowait()
  File "C:\Users\hjalte\AppData\Local\Programs\Python\Python37\lib\multiprocessing\queues.py", line 126, in get_nowait
    return self.get(False)
  File "C:\Users\hjalte\AppData\Local\Programs\Python\Python37\lib\multiprocessing\queues.py", line 109, in get
    self._sem.release()
ValueError: semaphore or lock released too many times

This exact code worked when I was using mlagents 0.16. I completely removed python and everything, before installing 0.28, so this is definitely a clean install.

For completeness, I’m including the agent I’m trying to train below:

using System.Collections;
using System.Collections.Generic;
using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using Unity.MLAgents.Sensors;
using UnityEngine;

public class TestAgent : Agent
{
    public Transform target;
    public float speed = 5f;

    private MeshRenderer floorRender;
    private MaterialPropertyBlock mpb;

    private void Start()
    {
        floorRender = transform.parent.Find("Ground").GetComponent<MeshRenderer>();
        mpb = new MaterialPropertyBlock();
    }

    public override void OnEpisodeBegin()
    {
        base.OnEpisodeBegin();

        transform.localPosition = new Vector3(Random.Range(-3f, 3f), 0, Random.Range(-3f, 3f));
        target.localPosition    = new Vector3(Random.Range(-3f, 3f), 0, Random.Range(-3f, 3f));
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        base.CollectObservations(sensor);

        sensor.AddObservation(transform.localPosition);
        sensor.AddObservation(target.localPosition);
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        base.OnActionReceived(actions);

        float moveX = actions.ContinuousActions[0];
        float moveZ = actions.ContinuousActions[1];

        transform.localPosition += new Vector3(moveX, 0, moveZ) * Time.deltaTime * speed;
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        base.Heuristic(actionsOut);

        ActionSegment<float> contActions = actionsOut.ContinuousActions;
        contActions[0] = Input.GetAxisRaw("Horizontal");
        contActions[1] = Input.GetAxisRaw("Vertical");
    }

    private void OnTriggerEnter(Collider other)
    {
        if (other.CompareTag("Wall"))
        {
            SetReward(-1f);
            mpb.SetColor("_Color", Color.red);
            floorRender.SetPropertyBlock(mpb);
            EndEpisode();
        }
        if (other.CompareTag("Goal"))
        {
            SetReward(1f);
            mpb.SetColor("_Color", Color.green);
            floorRender.SetPropertyBlock(mpb);
            EndEpisode();
        }
    }
}

I fixed this by upgrading Python to v 3.9.11
Version information:
ml-agents: 0.30.0,
ml-agents-envs: 0.30.0,
Communicator API: 1.5.0,
PyTorch: 1.7.1+cu110
…then had to do a:
pip install protobuf==3.20.3
And then the training worked just fine!