Hi Unity team and community:
I am trying to implement a Unet model to do denoising work as a post-process for rendering by Senti. Everything works well but the inference (i.e. engine.excute() ) consuming much more time than I expected. Here are some pics of rendering status and profiling
as you can see, the total time of rendering + post-process in one frame is about 84 ms and denoising cost most of it.
After searching deeper, I find that ConvTranspose2D_Kxk and Upsample2D_Nearest_Floot spend almost 40%, this
is unaccepted cause this model is pretty light ( 280k paras and 1.1 MB), input and output tensor size are both (1, 4, 1080, 1920)
As a comparison, I also use a native plugin that uses cuda and onnxruntime for inference. It only cost about 40 ms to do the same job
So I really wonder why these ops take so much time? Should I expect you will do some optimizing on it in the future?
Hope somebody can help. Thanks!