I would really like to be able to run some commands on a separate thread.
e.g. the commands TextureConvert.ToTensor() and engine.Execute() only seem to work on the main thread.
Which seems to mean the CPU on the main thread is blocked until the results come back from the GPU.
I have tried the option of running the models a few layers per frame but I don’t think this solves the issue since still the CPU is idle until the results are back from the GPU. Also, it’s quite tricky to calculate how many layers will be optimal per frame for every device to avoid a choppy framerate.
Or maybe there could be something like:
engine.ExecuteAsync(callback);
Another example: In the ExecuteOnTensor sample there is the line:
var outputTensor = s_Ops.ArgMax(m_InputTensor, axis: 0, keepdim: true);
Maybe they could be made asyncronous like:
var outputTensor = await s_OpsAsync.ArgMax(m_InputTensor, axis: 0, keepdim: true);
When using ONNX Runtime in unity I am able to do inference on a separate thread without affecting the framerate which makes it really simple to use. (e.g. I could run stable diffusion for 20 seconds in the background while still displaying real-time graphics).
If there was a simpler way to run models in the background without affecting framerate this would be excellent. (Maybe this is impossible without rewriting the whole of Unity IDK.)
Edit:
Just seen this page: Read output from a model asynchronously | Sentis | 1.0.0-exp.6
While it does help a bit (I think mainly because it’s calling the model every other frame) calling Execute() still blocks the CPU for a while.