I’m trying to run depth estimation onnx model from
https://github.com/isl-org/MiDaS/releases/download/v2_1/model-small.onnx
But it gives rank 3 tensor as an output
(?, ?, ?) instead of (?, ?, ?, ?)
When I tried to do TextureConverter.RenderToTexture if gives me an error:
BlitTensorToTexture.RankError: tensor rank should be equal to 4, got 3
Do you have any suggestion how to solve this issue?
I tried
TensorShape ts = new TensorShape(1, 3, 256, 256);
TensorFloat reshapedTensor = outputTensor.ShallowReshape(ts) as TensorFloat;
and it eliminated the rank error but giving a wrong result.
I apparently need to learn a bit more about tensor modification.
Midas gives you 1,256,256 output that is a gray scale output.
You want to convert it to 4D so tensor.ShallowReshape(new TensorShape(1, 1, 256, 256));
and the convert to a RT.
Your code doesn’t work because you are creating new dimensions. If you want to broadcast to a 3 channel tensor you need op.Tile(t, new [] {1, 3, 1, 1})
or using Expand
Thank you for the reply
I tried tensor.ShallowReshape(new TensorShape(1, 1, 256, 256)) too but it gave me flat red.
After ops.Tile(reshapedTensor, new int{1, 3, inputWidth, inputHeight}) as TensorFloat it gives me just black.
If I do tensor.ShallowReshape(new TensorShape(1, 1, 256, 256)) I can see something moving but layout is broken.
Would you be able to see what am I doing wrong in my code?
using Unity.Sentis;
using UnityEngine;
using UnityEngine.UI;
public class AIDepthEstimation : MonoBehaviour
{
private WebCamTexture camTex;
public RawImage outputImageUI;
public ModelAsset onnxModel;
RenderTexture outputTexture;
TensorFloat inputTensor;
TensorFloat outputTensor;
IWorker worker;
Ops ops;
public int inputWidth;
public int inputHeight;
void Start()
{
camTex = GetComponent<GetCameraImage>().camTex;
Model runtimeModel = ModelLoader.Load(onnxModel);
worker = WorkerFactory.CreateWorker(BackendType.GPUCompute, runtimeModel);
ops = WorkerFactory.CreateOps(BackendType.GPUCompute, new TensorCachingAllocator());
outputTexture = new RenderTexture(inputWidth, inputHeight, 0);
outputImageUI.texture = outputTexture;
}
void Update()
{
inputTensor = TextureConverter.ToTensor(camTex, new TextureTransform().SetDimensions(inputWidth, inputHeight, 3));
outputTensor = worker.Execute(inputTensor).PeekOutput() as TensorFloat;
outputTexture.Release();
TensorShape ts = new TensorShape(1, 1, inputWidth, inputHeight);
TensorFloat reshapedTensor = outputTensor.ShallowReshape(ts) as TensorFloat;
TensorFloat tiledTensor = ops.Tile(reshapedTensor, new int[]{1, 3, inputWidth, inputHeight}) as TensorFloat;
TextureConverter.RenderToTexture(tiledTensor, outputTexture);
inputTensor.Dispose();
outputTensor.Dispose();
reshapedTensor.Dispose();
tiledTensor.Dispose();
}
}
I think the tile should be { 1, 3, 1, 1 } as you only want to tile on the channels.
2 Likes
Thank you it worked!
The one last issue was I needed to scale the output since the output is not 0-1 but more like 0 - 10000.
Probably a good idea to pass the output into a Sigmoid
to collapse everything between 0/1
Or you can normalize the output too with GlobalAveragePool
1 Like