Unity 2023-2.13f1
ML-Agents 1.0.0
I have been struggling to get SAC to work inside unity ML-Agents using multiple different config parameters. Inside the same environment PPO manages to learn and converge within 1.5million steps while SAC struggles to find a solution after 6million steps and also experiences catastrophic forgetting. While training the SAC environment also freezes constantly for a long time after about 2000 steps.
The environment is not changed at all when training using both SAC and PPO.
Is there currently an issue with SAC inside ML-Agents?
Agent Action Space : Discrete
This is the learning graph for PPO , which shows steady learning

This is the learning graph for SAC, which is very unstable and struggles to learn.

Config For PPO:
FindExitAgent:
trainer_type: ppo
hyperparameters:
batch_size: 120
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 256
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 10
max_steps: 500000000
time_horizon: 1000
summary_freq: 12000
threaded: true
Config for SAC:
FindExitAgent:
trainer_type: sac
hyperparameters:
learning_rate: 0.0003
learning_rate_schedule: constant
batch_size: 256
buffer_size: 500000
buffer_init_steps: 0
tau: 0.005
steps_per_update: 20.0
save_replay_buffer: false
init_entcoef: 1.0
reward_signal_steps_per_update: 20.0
network_settings:
normalize: true
hidden_units: 512
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.995
strength: 1.0
keep_checkpoints: 5
max_steps: 5000000
time_horizon: 1000
summary_freq: 30000
Additionally training PPO only took around 30 minutes to reach a million steps while SAC had to run for nearly 2 hours. While I expected SAC to be slower because of the buffer size and other factors, the difference is immense.
I was curious and I also tracked time in between each academic step (Orange PPO , Gray SAC)
What could be the issue here?
Edit: For SAC I have tried numerous different configs while changing network size,layers , batch size but nothing has been successful.
