SAC, very slow training and freezing environment (No issues with PPO)

I have an environment with 6 identical agents which interact with each other. (This problem also occurs when I only have 1 agent.) Training with PPO and the below config file works as expected, performing around 30k steps in less than a minute.

behaviors:
  CarAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.995
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 30000000
    time_horizon: 1000
    summary_freq: 30000
    threaded: true

Using the same exact environment with the below config file using SAC and its relevant hyper parameters, freezes the environment as is able to compute 30k steps in around 45 minutes to 1.5 hours.

behaviors:
  CarAgent:
    trainer_type: sac
    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0003
      learning_rate_schedule: linear
     
      buffer_init_steps: 0
      tau: 0.005
      steps_per_update: 20.0
      save_replay_buffer: false
      init_entcoef: 0.5
      reward_signal_steps_per_update: 10.0
    network_settings:
      normalize: true
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.995
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 30000000
    time_horizon: 1000
    summary_freq: 30000
    threaded: true

I also tried the suggestions posted in SAC long train time on 12 cores AMD 5900X , running with, --tensorflow, changing cpu_utils,py and increasing the batch_size and steps_per_update, with no improvement.

I am also trying to create my own RL using the python API and again training time is very slow when compared to the PPO.

I am using the following command to start training from the CMD

mlagents-learn config_sac.yaml --run-id=SAC --env="....\FinalEnv\Build" --time-scale=10 --quality-level=0 --width=640 --height=640 --force

The below are a few more details.
Version information:
ml-agents: 0.23.0,
ml-agents-envs: 0.23.0,
Communicator API: 1.3.0,
PyTorch: 1.7.0+cu110

If found this issue as well but not sure what they mean by the inference configuration.(python - Unity ML-Agents Running Very Slowly - Stack Overflow) (Just including this for completeness, and maybe its a possible solution)

Any help would be appreciated.
Thank you in advance.

Let me bounce this off of the team for some guidance.

Hi @digi170

One thing that stands out to me is that your batch size is pretty large and buffer size is pretty small for SAC. Can you try a batch size of 256 and a buffer size of 1000000. I’d be surprised if this was the cause of such a slowdown but it should help a bit. I assume you are also running with time-scale=10 with ppo?

The stack overflow thread you shared is a problem we had to deal with when porting the codebase to torch but we have essentially addressed that to the best of our ability.

@andrewcoh_unity Thank you for your reply and suggestions.

The speed has increased drastically, completing 30000 steps in around 6 minutes, but the unity environment is still lagging (Not responding message on windows) and this seems to occur mostly just before outputting the summary, i.e. in this case every 30 k steps. I have also tried with a linear (rather than constant) learning_rate_schedule, batch size of 128 and buffer size of 4000000 with tensorflow (–tensorflow appended to the cmd) which further improved training time to around 2 minutes for 30K.

Yes, I am also using a time-scale of 10 with PPO.

Is there any environment property that is specific to SAC or should an environment that works with PPO also work with SAC without any changes?

Furthermore, does reducing and increasing the batch size and buffer size respectively, and changing the learning_rate_schedule effect how the agents are trained? I am asking this because I want to compare other models which I have trained with the above defined hyper parameters and not sure if I can compare them now that these hyper parameters changed.

Thanks again for your help.

Further to my comment about the unity environment freezing when the summary frequency is reached, I also noted that pressing enter in the command prompt when the environment is frozen, “unfreezes” the environment.

Thanks again.

@TreyK-47 @andrewcoh_unity

I have another question regarding the batch size and buffer size. I used the above-mentioned batch_size and buffer size (128 and 4000000) on 4 environments running 5 agents each (using SAC). Is this combination acceptable, especially since, the maximum suggested buffer size is 1000000?

Thank you for any help

I have the same problem. 9 months ago SAC was outperforming PPO for me on a 4 core CPU. Now I have a 12 core CPU and SAC is very slow + freezing the environment.

I’ll ping the team for an update for y’all.

Hey Roboserg and others, we do want to get to the bottom of the SAC slowdown issue. It’s most likely related to our recent switch from TensorFlow to PyTorch. What’s the magnitude of the slowdown between now and 9 months ago, and what version of PyTorch are you running? In our internal testing we saw no more than 20% slowdown between TF and PyTorch on 8-core machines, and a speedup when using certain network architectures (CNNs, LSTMs).

There are a couple things that are known to be able to speed up PyTorch execution. First, you can try disabling threading in the YAML (threading: false). Second, GPU/CUDA version of PyTorch is much better for parallelization, b/c of how they internally handle multiple network inference and backprop (I do realize this isn’t an option for everybody). Third, we cap the number of threads PyTorch uses to 4, as we found that PyTorch could interfere with running Unity at higher thread counts. You can change this in ml-agents/mlagents/torch_utils/torch.py if you’d like to experiment. We’ll make this an advanced option in the near future.

Of course the ultimate workaround is to use R10 of the Python code (with --tensorflow) or earlier. You’ll be able to train environments created with the latest Unity package, but without the new Hybrid Actions feature.

Hey all.

Is there any update from the dev team regarding fixing this issue?
I have followed the tread closely as while running experiments for my dissertation I experienced exactly the same issues. After testing and applying all proposed solutions, the training time for SAC decreased only by around 5% while PPO was completing the same scenario 6-10 times faster.

I am also having this issue running Pytorch 1.7.1, CUDA 1.10, mlagents 0.22.0 - is there any update from the dev team in another thread, has this been fixed in a more recent version of mlagents or did you find any workaround?

I am experiencing a similar issue running Pytorch 1.7.1, CUDA 1.10, mlagents 0.22.0.
The main problem I am having is that the training as well as the unity environment are freezing very often, with short periods of smooth training operation in between.
Is there any update from the dev team in another thread or has this been fixed in a more recent version of mlagents?

This is my training configuration file:
behaviors:
xyz:
trainer_type: sac
hyperparameters:
batch_size: 128
buffer_size: 124000
learning_rate: 3.0e-4
learning_rate_schedule: constant
buffer_init_steps: 0
init_entcoef: 0.6
tau: 5e-3
steps_per_update: 1
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
max_steps: 10000000
time_horizon: 64
summary_freq: 50000

I am using 25 agents

I have the same problem with freezing envrionments whenever the buffer_size steps are reached. I am using POCA for training the example environment dodgeball, and the envrionment freezes for quite some time whenever the buffer_size is reached (the gpu usage goes up then, probably changing the models?!). Is this behviour also because of the switch from tensorflow to pytorch that you mentioned in your post?
When I was using ML-Agents the last time 2 years ago these freezes didnt happen during training. Back then I was using PPO which might be another reason.

Please see this issue thread: Training environment freezes at buffer_size steps · Issue #5786 · Unity-Technologies/ml-agents · GitHub

Any fixes for this? if I --resume the train it’s fast again. Any Idea how to fix it? it’s a discrete environment (size=400) and buffer size of 2048.

I do have this issue on Mac M1 Max, It take forever during policy change. I set the buffer to 80000 and batch size to 8000 , the training is very smooth until the policy start to update. all environments got freeze , on Windows the freeze time is few seconds, on mac, I’m like around 10min now and still freezed.

Maybe you have a GPU on your Windows PC, and that is being used to accelerate the policy update phase? Mind you, 10 minutes sounds a bit long…

I’m trying SAC and its so slow its practically unusable in my case, PPO works fine.
The PPO training for my agents takes around 40~50 hours, I’m trying to improve the training time by using SAC.
Any ideas?

Windows PC - Lenovo Legion
Windows 11 Home
32 GB RAM
Intel Core i7-9750H CPU @ 2.60GHz 2.59 GHz
Nvidia RTX 2070i
Unity ml agents package version 2.0.1
Version information:
ml-agents: 0.30.0,
ml-agents-envs: 0.30.0,
Communicator API: 1.5.0,
PyTorch: 1.7.1+cu110

I’m working with 3 standalone environments each with 10 agents.
Episode Length of 3500 steps.
The environments run with the --no-graphics command.
any way of improving this?

This is a part of my console output:
you can see that when it reached 500K it start to freezes\slow down for about an hour and a half for each 5000 steps that previously took a few seconds.

[INFO] Driver. Step: 430000. Time Elapsed: 481.979 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
[INFO] Driver. Step: 435000. Time Elapsed: 486.224 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 440000. Time Elapsed: 490.896 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 445000. Time Elapsed: 495.746 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 450000. Time Elapsed: 500.527 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 455000. Time Elapsed: 503.927 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 460000. Time Elapsed: 509.504 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
[INFO] Driver. Step: 465000. Time Elapsed: 513.474 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 470000. Time Elapsed: 516.985 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
[INFO] Driver. Step: 475000. Time Elapsed: 521.133 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 480000. Time Elapsed: 525.139 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 485000. Time Elapsed: 530.389 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 490000. Time Elapsed: 533.989 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 495000. Time Elapsed: 538.515 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 500000. Time Elapsed: 542.526 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 505000. Time Elapsed: 6219.419 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 510000. Time Elapsed: 12409.326 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 515000. Time Elapsed: 18577.535 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 520000. Time Elapsed: 23261.468 s. No episode was completed since last summary. Training.
[INFO] Driver. Step: 525000. Time Elapsed: 29433.116 s. No episode was completed since last summary. Training.

This is my config yaml:

default_settings: null
behaviors:
  Driver:
    trainer_type: sac
    hyperparameters:
      batch_size: 1024
      buffer_size: 1000000
      learning_rate: 0.0003
      learning_rate_schedule: constant
      # SAC-specific hyperparameters
      buffer_init_steps: 500000
      tau: 0.005
      steps_per_update: 10.0
      save_replay_buffer: true
      init_entcoef: 0.75
      reward_signal_steps_per_update: 10.0
    # Configuration of the neural network
    network_settings:
      normalize: false
      hidden_units: 400
      num_layers: 2
      vis_encode_type: simple
      memory: null
      goal_conditioning_type: none
      deterministic: false
    behavioral_cloning:
      demo_path: Demos/
      strength: 0.00028
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      gail:
        strength: 0.00055
        gamma: 0.99
        demo_path: Demos/
        use_actions: true
        use_vail: false
        network_settings:
          normalize: false
          hidden_units: 256
          num_layers: 2
    # Trainer configurations
    init_path: null
    keep_checkpoints: 50
    checkpoint_interval: 1000000
    max_steps: 50000000
    time_horizon: 128
    summary_freq: 5000
    threaded: false
    self_play: null
env_settings:
  env_path: null
  env_args: null
  base_port: 5005
  num_envs: 1
  num_areas: 1
  seed: -1
  max_lifetime_restarts: 10
  restarts_rate_limit_n: 1
  restarts_rate_limit_period_s: 60
engine_settings:
  width: 84
  height: 84
  quality_level: 0
  time_scale: 10
  target_frame_rate: -1
  capture_frame_rate: 60
  no_graphics: false
environment_parameters:
checkpoint_settings:
  run_id: sac
  initialize_from: null
  load_model: false
  resume: false
  force: true
  train_model: false
  inference: false
  results_dir: results
torch_settings:
  device: null
debug: false

I’ve fixed by re-installing tensor flow for metal bec it seems was not installed correctly

1 Like

The same issue for me. Following your advices, I have tried several tests. My final test is:

    trainer_type: sac
    hyperparameters:
      learning_rate: 0.0003
      learning_rate_schedule: linear
      batch_size: 2048
      buffer_size: 204800
      buffer_init_steps: 204800

Before buffer_size not equal to buffer_init_steps, or buffer_init_steps =0, 1000, 10000, it was frozen about in 30 minites, but when buffer_init_steps=buffer_size=204800, it keeps training without frozen for 8 hours and finally frozen in 8 hours.

I found 3 functions related to buffer_init_step parameter in ml-agents\mlagents\trainers\trainer\off_policy_trainer.py.

def _is_ready_update(self) → bool:
def _update_policy(self) → bool:
def _update_reward_signals(self) → None:

Looking into first function def _is_ready_update(self) → bool:

def _is_ready_update(self) → bool:
“”"
Returns whether or not the trainer has enough elements to run update model
:return: A boolean corresponding to whether or not _update_policy() can be run
“”"
return (
self.update_buffer.num_experiences >= self.hyperparameters.batch_size
and self._step >= self.hyperparameters.buffer_init_steps
)

Finally I think freezing can occurre when updating policy and in my settings is_ready_update is always false.

And then my sac training is nothing, I think. Please give me your advices.