Inference running slow refer to ConvTranspose and Upsample2D ops (vs Onnxruntime)

Hi Unity team and community:

I am trying to implement a Unet model to do denoising work as a post-process for rendering by Senti. Everything works well but the inference (i.e. engine.excute() ) consuming much more time than I expected. Here are some pics of rendering status and profiling

as you can see, the total time of rendering + post-process in one frame is about 84 ms and denoising cost most of it.

After searching deeper, I find that ConvTranspose2D_Kxk and Upsample2D_Nearest_Floot spend almost 40%, this
is unaccepted cause this model is pretty light ( 280k paras and 1.1 MB), input and output tensor size are both (1, 4, 1080, 1920)

As a comparison, I also use a native plugin that uses cuda and onnxruntime for inference. It only cost about 40 ms to do the same job

So I really wonder why these ops take so much time? Should I expect you will do some optimizing on it in the future?

Hope somebody can help. Thanks!

Yeah that’s not normal.
Upsample shouldn’t be that high.
Could you share the model we can investigate and optimize a bit?

sure, but it’s not convenient for me to upload it to cloud, you can take it from
this github repo which named image_hfs_color_4_250_3_32_skip_usHDR_fp16.onnx

here is another issue, I also tested Sentis inference performance by opengl, it performed even worse to 150 ms, as I know opengl doesn’t support compute shader well, isn’t it?

OpenGL does support compute shader.
Webgl doesn’t and our pixel shader backend is slower than our compute one.
Could be the driver that is slower. I’ll investigate. thanks for sharing the model

Thanks for your swift reply! Yes, OpenGL does support compute shader but It seems to be not very efficient in inference and I never use Webgl.

Hope you can find answer as soon as possible!

This is known internally as Task 202