Hello !
I’m trying to wrap my head around running language models in inference mode at runtime without losing too many frames.
The problem is that a simple Execute or Schedule (depending on Sentis version 1.6 or 2.0 respectively) call that should be non-blocking is freezing the main thread
and can’t be called on another one.
Is this normal and I just misinterpreted the “non-blocking” expression?
I’m joining a sample script if you want to try it for yourself: Phi3Sentis2.cs (1.5 KB) (don’t mind the initial loading time and press return after it to trigger the worker’s scheduling method)
I’m running this on Unity 2023.2.6f1 with Sentis 2.0 and the model to run can be found here: Sentis Phi 3.5 Uint8
Will I experience the same kind of inevitable processing time using stuff like ReadbackRequest that is marked as async?
Thank you in advance for your help 
We renamed Execute to Schedule to better indicate what it does.
It schedules the workload on a given backend.
So for GPU it appends all the compute work to a commandbuffer and then dispatches that.
For CPU it schedules all the jobs to compute the different layers.
By non blocking we mean that we will not wait for the result to be done. We only schedule the workload.
Unfortunately that needs to happen on the main thread.
So you cannot schedule the work on a secondary thread.
Same as memory allocation we do that when we first schedule the work and re-use existing allocations, but that will take time on first run
Hello Alexandre,
Thank you for your explanation, it’s very instructive!
I get that the first schedule takes time but my bad for not mentioning that even if the first is on another computing’s league, every other schedule call is taking too much time (way less than the first, but still) to run this model at runtime.
Here is a capture of the profiler because it’s a bit long to set everything up to test my project:

Do I have to accept that I won’t be able to run language models like this at runtime for now?
I found this thread talking about inference optimisations and I guess that there have been many implemented since last year but do you think the future may promise more improvements on Sentis and maybe allow us to build local AI driven NPCs with these amazing small language models?
Thank you again for your time.