Good morning.

I trained a model through SAC, convergence is reached and everything seems nice, my agent complete his task in around 99.5% cases. I wanted to try the model in inference mode in order to see if higher percentages are reached.

In inference mode, which is the action chosen by the policy network? In training phase we train a stochastic policy, but in inference mode does this result in a deterministic action? If so, which action is chosen? The mean of the distribution described by stochastic policy? The mode of the distribution described by the stochastic policy?

Thanks a lot