How to use multi-threading,I tried running it using multithreading, but it returns telling me that I can’t use the main thread in multithreading
model.Execute needs to happen on the main thread because we are scheduling job/graphics workload.
All the layer and math operators are happening on a different thread and are already multithreaded, no need to worry about that.
What you need to worry about is blocking reads
https://docs.unity3d.com/Packages/com.unity.sentis@1.2/manual/read-output-async.html
I use posenet and the analysis needs to obtain 4 peekoutput data. How do I read the 4 data? I used the following code, but it is obviously。
private void ProcessOutput(IWorker engine)
{
Debug.Log("The main thread is output here");
agoRead = 0;
// Get model output
heatmaps = engine.PeekOutput(heatmapLayer) as TensorFloat;
offsets = engine.PeekOutput(offsetsLayer) as TensorFloat;
displacementFWD = engine.PeekOutput(displacementFWDLayer) as TensorFloat;
displacementBWD = engine.PeekOutput(displacementBWDLayer) as TensorFloat;
heatmaps.AsyncReadbackRequest(ReadbackCallback);
offsets.AsyncReadbackRequest(ReadbackCallback);
displacementFWD.AsyncReadbackRequest(ReadbackCallback);
displacementBWD.AsyncReadbackRequest(ReadbackCallback);
}
int agoRead=0;
TensorFloat heatmaps, offsets, displacementFWD, displacementBWD;
void ReadbackCallback(bool completed)
{
if (agoRead>=3)
{
// Put the downloaded tensor data into a readable tensor before indexing.
heatmaps.MakeReadable();
offsets.MakeReadable();
displacementFWD.MakeReadable();
displacementBWD.MakeReadable();
Debug.Log(heatmaps[0] == 42);
Debug.Log($"Output tensor value {heatmaps[0]}");
Parsemodeldata(heatmaps, offsets, displacementFWD, displacementBWD);
Debug.Log("The asynchronous reading thread is output here" + agoRead);
}
else
{
Debug.Log("The asynchronous reading thread is output here" + agoRead);
agoRead++;
}
}
private void Parsemodeldata(TensorFloat heatmaps, TensorFloat offsets, TensorFloat displacementFWD, TensorFloat displacementBWD)
{
heatmapsL = heatmaps.ToReadOnlyArray();
int stride = (imageDims.y - 1) / (heatmaps.shape[1] - 1);
stride -= (stride % 8);
The above is my code. I want to read 4 peekoutput data at the same time. How should I write it better?
I’d combine delegates to be safe.
Each outputs have their own async readback, so they need different delegates
Once all 4 delegates have been call you know that all 4 outputs are ready to be read.
Your cpde might have a race conditions by using the same ReadbackCallbacks for each of the different outputs.
Alternatively you can stay on the GPU and copy tensor output directly to a RT or blit them to screen
Samples/Convert tensors to textures
Samples/Copy a texture tensor to the screen
I used model.Execute in Update, and the game frame rate dropped a lot. If it is running in multi-threads, it should not affect the game frame rate. I have used unity mediapipe, and their background assumes that no matter what the frame rate is per second, No matter how many pictures are recognized, it will not affect the number of frames of the main thread. I use Unity Mediapipe to parse with the CPU, and there are 20 pictures per second, but the number of frames in my game is still more than 100. Have you considered this issue? Multi-threading should be completely separated from the main thread and should not affect the game frame rate.
I don’t know if there is a problem with my code or something, but I did encounter it. When parsing, on a low-configuration PC, the frame rate dropped particularly seriously.
Could you give us a bit more context?
- Are you running with a CPU worker? Or GPU
- How much images are you processing?
- Can you share the model or a profiler view?
I use gpu analysis
I am using camera analysis
Using posenet, upgraded through GitHub - cj-mills/Barracuda-PoseNet-Tutorial: This tutorial series provides step-by-step instructions for how to perform human pose estimation in Unity with the Barracuda inference library.,
The reason why Unity’s main thread is stuck is because it has been waiting for the GPU. It is especially obvious when I use integration. Is multi-thread analysis enabled here, but Unity’s main thread will wait for the graphics card to complete the analysis before continuing the operation of the next frame. ?
How do I share my project with you?
This is my git project, you can download it and take a look. I am using the 2022.3.3lts version. I did an asynchronous read operation. When I run it on 4070tigpu, the number of frames can reach about 300. When I use the integrated When using a graphics card, the speed is only a few dozen. What I hope is that the number of running frames of my project will not be affected by model parsing. It doesn’t matter if the model parsing is slow. Just notify me when it is parsed, and it should not affect the main thread. If it is because of the model Parsing affects the running time of the main thread, which I think is bad。GitHub - qq1342753906/sentis-posenet-: sentis-posenet Test
The cost of dispatching the Execute() call can be very significant, even if the model runs on GPU and uses other threads. I was trying to run wav2vec2 (speech detection) and it was hitting for 9ms to 12ms depending on the hardware, in main thread CPU, just to start the model. Complex models with many layers are just not very performant the way it is right now. For my project, I switched to a simpler model, and traded quality for performance.
@gaorenLXL can you provide a simpler sample as well as giving us a clear repro setp?
@jhughes2112 you can’t really compare wave2vec (1.2GB model with ~ 720 layers) with mobilenet (13.5MB model with ~143 layers)
The 9ms account to 0.0125ms per layer to schedule on gpu.
mobilet should take 2ms to schedule.
I double that this is what @gaorenLXL is referring too.
I would put the issue on a misplaced MakeReadable and not doing async readback ![]()
I have performed asynchronous reading. When I did not perform asynchronous reading, the 4070ti could only reach 60-70 frames. When I performed asynchronous reading, it could reach 300 frames. Have you seen the project I shared? Call When executing the Execute method, multi-threading will be enabled and the GPU will be called, but I feel that the main thread will wait for all Execute methods to be executed before starting the next frame.
When you analyze the model, will the main thread wait for the GPU? Have you tried it on a low configuration computer? I think if it causes the main thread to be blocked, is it possible for sentis to be applied to mobile phone platforms? Doesn’t the frame rate have to be particularly low? That’s my question.
I updated my project. PoseEstimator added the Asynchronous Reading option. You can take a look. I don’t know whether my approach conforms to the standard method of asynchronous reading, but in short, it can achieve the asynchronous reading method. When I enable asynchronous reading When reading, the frame number can increase a lot, but when I run it on a low-configuration computer, the frame number drops a lot. In my understanding, asynchronous reading and multi-threading should not affect the running quality of the main thread. , even if the impact should be very small, could you please test it on a laptop, a laptop without a discrete graphics card?
I don’t know if my understanding is wrong. Sentis will put the model on the GPU for analysis, but the main thread will wait for the GPU analysis to complete before continuing to run the next frame?
@gaorenLXL help me help you ok ![]()
As much as I’d like too I cannot look at all your repo do see the issue.
- if you provide me one file that illustrates the problem I can answer.
- else I can show you an example of non blocking GPU read from the output with one of the models in the repo
this is my model

