Inference with exported .ONNX model in python

Hi, I want to run inference from a .onnx model I created in ml-agents in a python script using onnxruntime.

It is a simple model with discrete action space (branch size 1x3), and runs correctly in Unity. But when I run it in python, it gives an array of three numbers as output, instead of a single integer. Do you know what are those numbers and how can I convert them into the correct vectorAction?

  • If I do inference in Ml-agents, and set the action space to continuous, I also get the same numbers as results. Therefore I think these numbers are what the network should output, and then they are somehow preprocessed before they are passed on to the agent.

  • My Tensorflow version is 1.13.1 and Ml-Agents version is 0.15.1

import onnxruntime
import numpy as np
model = "model.onnx"
sess = onnxruntime.InferenceSession(model)
#vector_observation:0
x = np.array([[-4.141203 , -0.8933127 , -3.927535 , -1.150026]])
x = x.astype(np.float32)
#action_masks:0
y = np.array([[-1.031152 , -1.114622 , -1.154025]])
y = y.astype(np.float32)
result = sess.run([output_name], {"vector_observation:0": x, "action_masks:0": y})

Output:
[array([[-1.0638015, -1.1055297, -1.1275567]], dtype=float32)]
the vector action in Unity for the same input was “0”

Thanks!

Those are the log probabilities of each branch being chosen. If you raise ‘e’ to each one, you’ll see that they sum to 1.0:

>>> import math
>>> log_probs = [-1.0638015, -1.1055297, -1.1275567]
>>> probs = [math.pow(math.e, lp) for lp in log_probs]
>>> sum(probs)
1.0000002336510594

You can see how this is used in C# in this part of the code: ml-agents/com.unity.ml-agents/Runtime/Inference/ApplierImpl.cs at 0.15.1 · Unity-Technologies/ml-agents · GitHub - the src.data will contain the log probabilities, then we convert them to normal probability space, and select an index based on the probability weight.

Thank you! So I should be able to select the index of the largest number, and that should be the agentAction. But from the numbers I get, the first is always the largest, for example:
.onnx inference result (Python) correct agent action (from inference in Unity)
[-1.0044976, -1.1231222, -1.176004] 0
[-1.0019577, -1.1218884, -1.180334] 0
[-1.0027038, -1.1222707, -1.179038] 0
[-1.0023875, -1.1221073, -1.1795887] 1
[-1.0007201, -1.1212393, -1.1825048] 2

I have guess, that it has to do with the action masks. I don’t use any action masks in my agent and I try to override the default action mask collector:

    public override void CollectDiscreteActionMasks(DiscreteActionMasker actionMasker)
    {
        List<int> actionIndices = new List<int>() { };
        actionMasker.SetMask(0, actionIndices);
    }

But the model still expects an array of three for “action_masks:0” to run. For that always gave [1,1,1] as input.