Run CPU jobs single threaded (or limit thread usage)

I am trying to make a sentis STT/TTS model run on Android (Quest 3)
I tried async readback, but when it is on the GPU it is causing many lag spikes. When I use it on the CPU the lag spikes are a lot less!
Sadly it still is not smooth enough, since the URP render jobs cannot process since all the worker threads are taken up by Sentis jobs. How to eliminate this issue?
Can we run the sentis jobs on 1 thread?
Or do we need to change something in our setup?

You can’t run it on another thread but you can spread the running of the model over many frames in order to allow the GPU to update the graphics: Have a look at this example.

With the Unity Whisper sample this still creates lag spikes way too high for use in XR sadly. Using CPU had the lowest frame spikes

Yeah I think that sample isn’t the most optimized.
If you ever have a profiler capture it would be nice for us to see where the spikes are coming from

Here is the profiler data. You can see the editor spikes at the left, phone spikes right.
Benchmarks.data

The spikes are from WaitForSignal, which is usually GPU bottleneck

Interesting, it performs a lot better when graphics jobs disabled and having sentis use jobs, since the graphics calls cannot be halted by the sentis API

I’m seeing
First frame
GetTokens take 102ms that’s JsonConvert
LoadModelDesc taking 60ms (crazy!)
Next frame
Execute taking 452ms with a bunch of Semaphore.WaitForSignal.
That probably means that whisper is too costly on the gpu to be run in one frame.
So I’d either split execution across many frames (ie dispatch one layer at a time) for a few ms.
Or switch to the cpu. Execution will be spread until you call a completependingtransactions