Need Help Discrete actions Network explaination and Barracuda inference

Hello,

I’m training a MLAgent to perform discrete actions (3), no branch, no mask involved.

Looking at the onnx, I can see the network achitecture as 2 output :

  • discrete_actions (shape = batch x 1)
  • deterministic_discrete_actions (shape = 1 x 1)

Question : what is the difference between them ?
Question : Which one is used in MLAgent training and in MLAgent inference ?

Next, I want to perform inference using barracuda.

I’m made a test into my MLAgent Agent implementation to add Barracuda worker.
Both MLAgent and Barracuda worker load the same onnx file.

I feed MLAgent and Barracuda with same data (observation)
I perform inference on Both
MLAgent DiscreteActions[0] and Barracuda output tensor have same values most of the time but quite often it differs. Why ?

Note 1 : I’ve tried with both network output (discrete_actions and deterministic_discrete_actions)
Note 2 : to get Barracuda output, this is my code, is it OK ?

Tensor O = m_Worker.PeekOutput("discrete_actions");
int action = (int) O[0,0,0,0];

Ideally when in inference mode Random Actions should be disabled which would make it deterministic. However it is not. This seems like a feature request that may some day get implemented. Or not.

Fully explained here: Is Deterministic behaviour on trained models possible? · Issue #2643 · Unity-Technologies/ml-agents · GitHub