SAC agent just stops moving randomly

Hi all, this is my training graphs of PPO (grey) vs SAC(Orange). I am training a car to drive in lane.
7403765--904976--upload_2021-8-10_10-47-16.png

The PPO agent has no issue, but during training, the SAC agent will just stop moving all of a sudden as though its taking a break, then continue driving after a random period of time. It’s performance is also observed to be decreasing as well as it sometimes just drives out of its lane. PPO does not experience this problem at all.

mlagents2.1.0

Hi, SAC will “stop” very often because it needs to update the policy. SAC is an off-policy algorithm, meaning that it can use a lot more data than PPO (hence the updates can be a lot longer). If your environment does not rely on “realtime” time, then training should be fine. SAC and PPO do not perform similarly depending on the environment, some environments are easier for PPO to solve, this might be the case here.

Hi @vincentpierre , when i mentioned “stop” its not for updating the policy though. The car literally slows down gradually to a stop and stops moving for awhile even though the policy is not being updated. I know this because when the policy updates, my unity “freezes” for like a second, but in this case, the car doesnt move for over a number of “freezes” (policy updates). Unless “freezing” doesnt mean the policy is updating I might be wrong

Strange that SAC would send a constant action for not moving. It could be because the entropy target is too low. Does moving yields some rewards (or at least avoids some penalties)? SAC should take very frequent random actions so I am quite surprised. Is it easy for the car to move or does it require the agent to maintain the forward button for a while to move? Maybe you could try with an extra reward for moving?

Yup I had a multiplier corresponding to the agent’s forward velocity. It moves perfectly fine for for other parts of the track. only a certain stretch of track (which isn’t a sharp turn or anything difficult) has this ‘decelerate to a stop’ issue

Is there something specific happening in this part of the track? Are the observations all right around that spot or do they take values unseen so far? Maybe there is an invisible collider that confuses the raycasts ?

@vincentpierre nope its a perfectly normal part of the track. nothing wrong with the colliders as well. PPO trains perfectly with no issue so could it be something wrong with the SAC algo?
Also, can i check what is the supposed behaviour of entropy for SAC? Thanks!

Entropy is supposed to eventually go down when using PPO or SAC as the algorithm gains more confidence. SAC is much more useful in stationary environments. From this issue, I cannot tell what could be the cause of this failure of SAC.

sorry just to clarify, from what i know, SAC tries to maximise entropy. but in this case, it goes down and its lower than PPO?
orange = PPO
blue = SAC
7512409--926294--upload_2021-9-22_10-55-26.png