Hi.
Running trained models outside of Unity is not a supported feature. These models are made to work with the Unity Inference Engine which is why they are hard to read from Python.
Action masks are for masking some of the actions, The values can be zeros or ones. If you don’t want to mask anything, make this tensor all zeros (or all ones I do not remember).
Regarding the actions, for legacy reasons, the output corresponds to the log probabilities of the actions for each branches. Since you have 2 actions with 3 possibilities, the 3 first numbers correspond to the log probabilities of the first branch and the last 3 to the second.
To sample from them, you need to exponentiate them and sample using a multinomial distribution. For example:
array([[-1.6118095e+01, -1.5712630e+01, 1.1920928e-07, -1.6118095e+01, -1.7318338e-02, -4.0646248e+00]], dtype=float32) [\code]
means that the first action has logits : -1.6118095e+01, -1.5712630e+01, 1.1920928e-07
Note that exp(-1.6118095e+01)+ exp(-1.5712630e+01) + exp( 1.1920928e-07) = 1
And the second logits -1.6118095e+01, -1.7318338e-02, -4.0646248e+00
Also note that exp( -1.6118095e+01)+ exp(-1.7318338e-02) + exp( -4.0646248e+00) = 1
If we look only at the first branch, -1.6118095e+01, -1.5712630e+01, 1.1920928e-07
after exponential becomes 1.00000065e-7, 1.50000081e-7 and 0.99999988079, so the for the first branch, the selected action is 2 (with very high certaincy).