I made a small testing app and realized that Unity interferes with all asynchronous background tasks and makes them run 4x slower. Unity no longer interferes with background tasks when I set application.targetFrameRate to 29, but then the animation fps drops to 20 which is not ideal. The reason this is a problem is that I originally noticed that my speech recognition model runs 4x slower after running quickly for a few seconds, but it runs perfectly fine at 29 fps cap. I control the fps cap with Application.targetFrameRate = 29;
.
Why is the performance for the app characteristically different between 29 and 30 target fps? What’s causing the significant interference with background CPU jobs?
This is really easy to re-produce on older iPads. Here are the steps:
- Use the latest stable release of Unity (2022.3 in my case).
- Simply select the template 2D app with either URP or default rendering pipeline.
- Add a single square game object. Attach the
StartupScript.cs
to this cube, which spawns a busy loop thread with timing code. This script is an attached file here.
- When building the app, switch the project to iOS. Run the app on any iPads with an A8x or A10 chip. These correspond to ipad 6 or 7th gen and earlier. This problem does not exist on the latest A15 and later. A11, A12, A13, A14 not tested.
- Notice in the xcode console log that it runs fast (~0.2seconds) for about 4 seconds, and then it runs 3-5x slower (around 0.9 seconds) with a lot of indeterministic variance.
The busy work is this, but any busy work will do:
for (var i = 0; i < 100000000; i++)
{
if (i % 2 == 0) {
sum *= i;
} else if (i % 3 == 0){
sum /= i / 3;
} else {
sum += i;
}
}
On it’s own, this takes about 0.2s to complete. When Unity interferes with it somehow, it takes 0.9s, the same exact slow down observed with my speech recognition model.
A natural question is “ah, maybe the thread isn’t getting time on the CPU due to limited resources,” but this isn’t true. The slow performance is all despite having ample resources for the thread. From profiling in Xcode and Unity on A8x which has 3 cores, at any given time 2 cores are mostly idle while the one worker thread works. That worker thread get’s about 95% of the time on a core when the target frame rate is 29, while about 85% when it’s 30. In both cases, the worked thread is preempted ~700 times per second, except in the slow case these preempts last 10x longer on average. I can discuss the profiling results in more detail if people think that’s relevant.
This thread was originally posted at Performance issue. Background thread runs 4x slower at 30fps compared to at 29 fps. but I couldn’t figure out how to move that thread to a more appropriate section.
9489961–1335607–StartupScript.cs (2.43 KB)
Here’s a plot of the speed of the busy work over time
The difference is between 20 and 30 fps, as you noted. That is 66% vs 100%. There’s about 33% more CPU time available for background tasks. Perhaps it’s as simple as that? Yes, there are multiple threads but the main thread may still be the bottleneck and you say it’s at 95% so practically not available at a moment’s notice.
Could you share how and where you start these asynchronous background tasks, and how they synchronize with the main thread? Are these async/await or System.Thread or Jobs or … ?
Thanks for the reply @CodeSmile ! Yes the exact file with everything needed for re-production is an attachment called StartupScript.cs in the post above. The device is certainly not bound by being out of CPU resources. It is at 95% of one core, meaning that total resource utilization is ⅓ or roughly 30%. When I run larger workloads I see numbers closer to 280% utilization of the 3 cores.
There are no waits.
Some further notes:
- On Unity 2022.3, you actually have to set the target fps to <= 23 in order for the realtime fps to change to 20. If you set it to 24,25,26,27,28,29 the realtime fps will revert to 30.
- In Unity 2022.2, if you set the target fps to anything in [29-18], the realtime fps will get pinned to 20.
- When I pause the app in the xcode debugger and then hit run again, the background work runs quickly for a few seconds before getting 4x slower again.
I measure the realtime fps with
private string _lastFpsText;
private void OnGUI()
{
if (Time.frameCount % 10 == 0)
{
_lastFpsText = $"Realtime FPS: {1 / Time.deltaTime}";
}
GUI.Label(new Rect(20, 20, 1000, 50), _lastFpsText);
}
Below is a screenshot of profiling the app in Unity. You can see that for a few seconds on the left it runs fast, and then when the green spikes show up things run slow.
The main function that is being called is WaitForTargetFPS. In the case when the real FPS is 20, the profiling results look like this:
Related stack overflow threads with apparently the same performance characteristics as this problem (the fast then slow, and the fast again when re-started in the debugger):
In neither case were they able to figure out how to fix this. #1 found that by reducing the amount of updates from 60 fps to only when needed, they were able to fix it.
One might wonder, “is this also true for apps built purely in swift?” And the answer appears to be no. I was able to run the speech model at 55 fps in an app written in Swift where I verified the realtime FPS in the XCode profiler. The difference between 20 fps and 55 fps in the xcode app is roughly a 20% slower speech recognition model, which is not much compared to a 400% slow down.
Hi,
I have some thought on this:
+Some CPU heating task like your background task( without idle) is very stressful for multi-threading design of OS thread pooling.
+The fact behind multi-thread design of OS is scheduling all thread task to hardware core. If a task which cannot be interrupted like just count 1->infinity requires dedicated core to avoid scheduling throttling randomly.
All task which takes long time “should” be performed in lazy way.
I wonder what your actual background task is in details and can it run without throttling cpu?
Hi @huonguvw , thanks for the thoughts.
- When I run the task with a slightly lower frame rate of 20 fps, that same intensive task runs quickly.
- When measuring in the profile, there are no heating events and temperature remains nominal. Furthermore, at any point while running the background work, if I pause the app for 0.5 seconds and start it up again the busy work runs fast again for 5-10 seconds.
When I run the full speech recognition workload in an iOS app with FPS 60, or a unity app with FPS <= 20, the speech model utilizes 250% out of 3 cores without any breaks, and the temperature of the device stays nominal and does not overheat. Also the profiler says that the speech model is not being throttled, and it’s getting the expected amount of time “running” on the CPU.
This has peaked my interest and I want to take a look at this; I also responded on LinkedIn. A couple of things jump out right away from your script, which may or may not have anything to do with the root cause or the solution, but just brainstorming:
- Believe it or not, that Debug.Log statement could be a contributing factor. Debug.Log statements and debug builds in general unfortunately can have a rather large impact on performance by themselves.
- I see you have some commented out UnityEngine.Random.Range code in there… a lot of the Unity stuff is not thread safe, I assume you tried that out and it was even worse, and hence the commenting?
- Coroutines or async tasks might be a more performant way to do things in Unity; they are used quite frequently for this very reason.
- I find it interesting that it is render that appears to be taking up the majority of your performance; all render takes place in the main thread in Unity, seems like it is being blocked even though you’ve opened up a new thread, perhaps because this script is on a GameObject in the scene. Does it need to be? Have you tried a ScriptableObject, or instantiating an object completely outside of the scene graph that performs this work?
This thread is kind of old, but still has a lot of relevant stuff in it in 2023 as far as Unity best practices: multithreading - How to not freeze the main thread in Unity? - Game Development Stack Exchange
So I unfortunately don’t have a modern enough MacOS machine or iOS device so cannot build this for iOS, but even just in the Unity editor I notice some things right away pasting your script onto a 2D Square sprite:
- The thread isn’t being shutdown properly when the apps exits, so it keeps running and logging to my debug log even after I’ve stopped the app.
- The sum never increases beyond 10.
- Performance drops from ~1000 fps to ~250 fps; I have to do an apples-to-oranges comparison since I’m not set up for iOS, but a 75% performance hit is pretty huge.
Out of curiosity, have you tried ECS? DOTS/ECS/Jobs are nowadays the recommended way to do multi-threading in Unity: