Onnx model giving rank 3 output. How can I convert it to rank 4?

I’m trying to run depth estimation onnx model from
https://github.com/isl-org/MiDaS/releases/download/v2_1/model-small.onnx

But it gives rank 3 tensor as an output
(?, ?, ?) instead of (?, ?, ?, ?)

When I tried to do TextureConverter.RenderToTexture if gives me an error:
BlitTensorToTexture.RankError: tensor rank should be equal to 4, got 3

Do you have any suggestion how to solve this issue?

I tried

        TensorShape ts = new TensorShape(1, 3, 256, 256);
        TensorFloat reshapedTensor = outputTensor.ShallowReshape(ts) as TensorFloat;

and it eliminated the rank error but giving a wrong result.
I apparently need to learn a bit more about tensor modification.

Midas gives you 1,256,256 output that is a gray scale output.
You want to convert it to 4D so tensor.ShallowReshape(new TensorShape(1, 1, 256, 256)); and the convert to a RT.
Your code doesn’t work because you are creating new dimensions. If you want to broadcast to a 3 channel tensor you need op.Tile(t, new [] {1, 3, 1, 1}) or using Expand

Thank you for the reply :slight_smile:
I tried tensor.ShallowReshape(new TensorShape(1, 1, 256, 256)) too but it gave me flat red.
After ops.Tile(reshapedTensor, new int{1, 3, inputWidth, inputHeight}) as TensorFloat it gives me just black.

If I do tensor.ShallowReshape(new TensorShape(1, 1, 256, 256)) I can see something moving but layout is broken.

Would you be able to see what am I doing wrong in my code?

using Unity.Sentis;
using UnityEngine;
using UnityEngine.UI;

public class AIDepthEstimation : MonoBehaviour
{
    private WebCamTexture camTex;
    public RawImage outputImageUI;
    public ModelAsset onnxModel;

    RenderTexture outputTexture;
    TensorFloat inputTensor;
    TensorFloat outputTensor;
    IWorker worker;
    Ops ops;

    public int inputWidth;
    public int inputHeight;

    void Start()
    {
        camTex = GetComponent<GetCameraImage>().camTex;
        Model runtimeModel = ModelLoader.Load(onnxModel);
        worker = WorkerFactory.CreateWorker(BackendType.GPUCompute, runtimeModel);
        ops = WorkerFactory.CreateOps(BackendType.GPUCompute, new TensorCachingAllocator());
        outputTexture = new RenderTexture(inputWidth, inputHeight, 0);
        outputImageUI.texture = outputTexture;
    }

    void Update()
    {
        inputTensor = TextureConverter.ToTensor(camTex, new TextureTransform().SetDimensions(inputWidth, inputHeight, 3));
        outputTensor = worker.Execute(inputTensor).PeekOutput() as TensorFloat;
        outputTexture.Release();

        TensorShape ts = new TensorShape(1, 1, inputWidth, inputHeight);
        TensorFloat reshapedTensor = outputTensor.ShallowReshape(ts) as TensorFloat;
        TensorFloat tiledTensor = ops.Tile(reshapedTensor, new int[]{1, 3, inputWidth, inputHeight}) as TensorFloat;

        TextureConverter.RenderToTexture(tiledTensor, outputTexture);

        inputTensor.Dispose();
        outputTensor.Dispose();
        reshapedTensor.Dispose();
        tiledTensor.Dispose();
    }
}

I think the tile should be { 1, 3, 1, 1 } as you only want to tile on the channels.

2 Likes

Thank you it worked!
The one last issue was I needed to scale the output since the output is not 0-1 but more like 0 - 10000.

Probably a good idea to pass the output into a Sigmoid to collapse everything between 0/1
Or you can normalize the output too with GlobalAveragePool

1 Like

Great thank you!

1 Like