Inconsistent Model Inference (.onnx)

I am trying to implement movenet in unity sentis for pose detection. The code I’ve written is given below:


using UnityEngine;
using Unity.Sentis;

public class RunGoldenImage : MonoBehaviour
{
    public ModelAsset model;
    [SerializeField] private Texture2D inputImage;
    [SerializeField] private float[] results;

    IWorker worker;

    static Unity.Sentis.BackendType backendType = Unity.Sentis.BackendType.GPUCompute;

    const int imageWidth = 192;

    TensorFloat inputTensor = null;

    void Start()
    {
        Model runtimeModel = ModelLoader.Load(model);
        worker = WorkerFactory.CreateWorker(backendType, runtimeModel);
    }

    public void ExecuteML(Texture2D inputImage)
    {
        inputTensor?.Dispose();

        var transform = new TextureTransform();
        transform.SetDimensions(192, 192, 3);
        transform.SetTensorLayout(0, 3, 1, 2);  // NHWC format

        inputTensor = TextureConverter.ToTensor(inputImage, transform);

        // Normalize the input tensor if needed (e.g., divide by 255.0)
        var inputArray = inputTensor.ToReadOnlyArray();
        for (int i = 0; i < inputArray.Length; i++)
        {
            inputArray[i] /= 255.0f;
        }
        inputTensor = new TensorFloat(inputTensor.shape, inputArray);

        worker.Execute(inputTensor);

        TensorFloat output = worker.PeekOutput() as TensorFloat;
        output.CompleteOperationsAndDownload();
        results = output.ToReadOnlyArray();

        // Debugging statements to compare input and output
        Debug.Log("Input tensor: " + string.Join(", ", inputTensor.ToReadOnlyArray()));
        Debug.Log("Output tensor: " + string.Join(", ", results));
    }

    void Update()
    {
        ExecuteML(inputImage);
    }

    private void OnDestroy()
    {
        inputTensor?.Dispose();
        worker?.Dispose();
    }
}

The code just takes an Image as input and stores the 51 (17x3) coordinates and confidence scores of each landmark. The model runs fine but the issue is with the output wherein the confidence scores of all the landmarks are very low (0.08 - 0.1) which made me question the validity of the model’s weights and such.

As a next step, I tried to recreate this exact code in python (given below):

import onnx
import onnxruntime
import numpy as np
import cv2
import pprint

model_path = "model_float32.onnx"

def main():
    model = onnx.load(model_path)
    onnx.checker.check_model(model)

    sess = onnxruntime.InferenceSession(model_path)

    image = cv2.imread('image.png')
    frame = cv2.resize(image, (192, 192))
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame = np.expand_dims(frame, axis=0)
    frame = frame.astype('float32')

    inputs = {sess.get_inputs()[0].name: frame}
    outputs = sess.run(None, inputs)

    pprint.pprint(outputs)

if __name__ == "__main__":
    main()

with the following output:

[array([[[[0.17123619, 0.7366814 , 0.60726154],
         [0.12668224, 0.7544503 , 0.6296105 ],
         [0.15546177, 0.6871232 , 0.50444686],
         [0.14814584, 0.81889844, 0.4947436 ],
         [0.21623068, 0.64096504, 0.50570345],
         [0.354243  , 0.9810541 , 0.5238346 ],
         [0.54735583, 0.5287507 , 0.67092067],
         [0.3731925 , 0.99938095, 0.01898247],
         [0.96912557, 0.40461412, 0.30012062],
         [0.31255066, 0.92820466, 0.02434874],
         [0.3674486 , 0.69174904, 0.07895985],
         [0.98664856, 0.9479264 , 0.19786224],
         [1.0049582 , 0.6971894 , 0.15330565],
         [0.9049185 , 0.9648921 , 0.06867296],
         [0.95806223, 0.7389217 , 0.0637106 ],
         [1.0060338 , 0.92667395, 0.02732971],
         [1.0052091 , 0.6853877 , 0.05690652]]]], dtype=float32)]

The confidence scores of the same model on the same image are much higher in python, making me question if I messed up anything In my sentis code. If you have any idea why this is the case, please let me know.

Thank You

Can you provide a link to the model? I’ll try to reproduce it myself.

Link to model : model_float32.onnx - Google Drive

Steps to recreate:

  • Open a new scene in unity
  • Make an empty GameObject
  • Drop the c# Script onto the Game Object
  • Drag and drop your image and model in the dedicated fields
  • Run the game
  • Results can be seen in the GameObject’s Result Field

I have only briefly looked at the code, but I think this is likely an issue with input normalization.

Afaik cv2.imread returns values in the 0-255 range. If that’s what the model expects you need to multiply each value by 255, since TextureConverter.ToTensor returns a tensor with float values in the 0.0 - 1.0 range.

Unfortunately, changing the range to (0-255) didn’t make a difference. It’s all still random with equally low confidence scores. Please let me know if there’s anything else you find.

Same thing. Random positions.

How was the model you’ve linked exported? I’ve looked at the model in Netron to try to figure out what input it expects. It looks like there is some input normalization already baked into the model.
movenet_netron

It basically does (input * 1/255 * 2) - 1
So the model seems to try to map from [0-255] to [-1.0, 1.0]
So it sounds like my above suggestion should work, but If you control the exporting code, I’d personally try to unify this to not do useless calculations. You can also see that there is already a (0, 3, 1, 2) Transpose operation baked into the model, so you’re doing that twice canceling each other out, not sure if that’s intended.

Lastly, when running your code in the places where you’re calling inputTensor.ToReadOnlyArray() I’m getting an InvalidOperationException: Tensor data cannot be read from, use .CompleteOperationsAndDownload() to allow reading from tensor.
When adding inputTensor.CompleteOperationsAndDownload() before each of these, the code runs fine and I’m getting reasonable output values for the test image I’m using. (running this with Sentis v1.5.0-pre.3)

After bringing the texture data to the limit of 255 everything worked! Thanks julienkay!

inputTensor = TextureConverter.ToTensor(inputImage, tr);
inputTensor.CompleteOperationsAndDownload();

var inputArray = inputTensor.ToReadOnlyArray();
for (int i = 0; i < inputArray.Length; i++)
{
inputArray[i] *= 255.0f;
}
inputTensor = new TensorFloat(inputTensor.shape, inputArray);

1 Like

You’re a legend. The model works now.
The mistake I made while following your advice was that I just commented out the part where the pixels were being divided, and didn’t multiply them.

It works very fast and accurately now, I cannot thank you enough.

1 Like