Hello,
I’m training a MLAgent to perform discrete actions (3), no branch, no mask involved.
Looking at the onnx, I can see the network achitecture as 2 output :
- discrete_actions (shape = batch x 1)
- deterministic_discrete_actions (shape = 1 x 1)
Question : what is the difference between them ?
Question : Which one is used in MLAgent training and in MLAgent inference ?
Next, I want to perform inference using barracuda.
I’m made a test into my MLAgent Agent implementation to add Barracuda worker.
Both MLAgent and Barracuda worker load the same onnx file.
I feed MLAgent and Barracuda with same data (observation)
I perform inference on Both
MLAgent DiscreteActions[0] and Barracuda output tensor have same values most of the time but quite often it differs. Why ?
Note 1 : I’ve tried with both network output (discrete_actions and deterministic_discrete_actions)
Note 2 : to get Barracuda output, this is my code, is it OK ?
Tensor O = m_Worker.PeekOutput("discrete_actions");
int action = (int) O[0,0,0,0];