I was not sure if this is the right place to post my question (please redirect me if not)
I have a game that is laggy when tested on Android.
I ran the profiler and can see that Rendering is eating up the performance.
Looking at the CPU usage I can see that Semaphore.WaitForSignal in Gfx.WaitForPresentOnGfxThread is eating up 75%.
I’ve read that it’s just an indication that the GPU is busy. So I checked the GPU use-age and there Render.OpaqueGeometry takes up 80% and 40-50% of that is RenderDeferred.Lighting (Mesh.DrawVBO).
You are clearly not GPU bound, as the GPU frame time is 0.45ms in that screenshot, while the CPU time is 41ms.
Those samples are more clearly explained in the Profiler documentation. Please read that section and check the Timeline view of the Profiler. I suspect it’s got to do with the timings between Render thread and Main Thread, and what’s happening on the Rendering thread is the relevant bit because that’s where that Semaphore is that the main thread is waiting for.
The manual also explains and links to all the relevant info regarding vSync and Application.targetFrameRate
What device you’re testing on? AFAIK deferred is going to be slow on most mobile GPUs due to the increased bandwidth requirements. Even more with 20 dynamic lights: that’s defenitely far beyond the usual mobile capabilities, which makes me think those GPU usage numbers in the profiler are incorrectly reported.
Yeah, the 20 lights might be a part of the problem but it could just as well be a problem only on the CPU to process them all in the Rendering Thread. It might be that the CPU Rendering code already determined that most of them are not needed for drawing/culled so it doesn’t even bother the GPU with it.
I’d start out assuming that over assuming the GPU Profiling data would underreport anything. If GPU profiling does one thing to muddy the waters of the actual performance on the GPU when not profiling, it is to inflate the cost, not to underreport it.
So it could be that the CPU Rendering logic could use some help to determine what is relevant to be sent to the GPU and then the problem just goes away. What that is is hard to tell until you’ve looked at that Render Thread data.
Speaking of GPU profiling cost. You might want to look at the Frame Debugger and also check out the batching counts in there and the Rendering module without GPU Profiler Module turned on. Because I believe that module might break the batching in order to report times more granularly.
The Huawei P6 is a phone from 2016, and this are its scores running the Manhattan demo:
While the OP didn’t post a screenshot of what kind of scene they are running on that phone, I doubt it could run even an empty scene with deferred rendering enabled at under 1ms per frame, so the GPU driver is likely reporting those timings incorrectly. It’s probably reporting only command submission times, not time taken to actually render those commands.
So here is a screenshot of the timeline:
The full:
Zoom in before:
Zoom in after:
This is the scene I’m running:
It’s just primitive polygons. However, the squares are spawn randomly and have lights inside
I don’t really understand how the timeline helps?
Without the GPU profiling support (deep profiling support) it’s the same thing actually. Here is the profiler with batches selected without deep profiling support:
And that is the frame debugger:
This specific scene in that specific time had “only” 6 lights. However, the green blocks spawn randomly and you can have up to 20 lights in one scene at the same time currently.
So I went on with the assumption and disabled all the lights (except the skybox).
And I still have the same issue where fps is around 30 and rendering with waitforgfxthread taking up 84%.
See:
So I went on and created a new project to see whether it has to be with my project or my phone. As in the new project, the FPS is fine (I’m at 250fps with an empty scene) I assume it’s not a phone issue?
Adding a cube and 6 point lights around it still gave me 250fps.
Adding the same skybox still gave me 250fps
Did you try that new empty project on a phone and get the 250fps value from the Profiler CPU Usage chart with the vSync category disabled? Its usually yellow colored box in the legend is black in all your screenshots so I’m assuming that’s the case. Phones won’t let you run at a higher frame rate than their monitor’s refresh rate in Hz (i.e. 60Hz monitor = 60 FPS max cap). I can’t currently find the refreshrate on a Huawai P6 but you can easily find it via Screen.currentResolution.
Knowing the refresh rate and what you are setting Application.targetFrameRate (please check out that page for more info) to, as well as what your vSync settings are is likely going to be key in understanding your issue here, as it seems like the Render Thread is mostly spending time in Gfx.Present*, while the GPU is already done (as per the GPU Profiler). So it is likely waiting for the next vBlank to come along and flip.
*(you have a version where timeline view is still bugged and doesn’t show the name for samples that are overlapping from the previous frames properly, so this is a mild assumption on my part, but you can verify that by selecting the previous frame and scrolling to where it change.)
It shows you what happens in parallel on other threads and at which point in time e.g. the waiting on the render thread is done and the frame flipped.
If the render thread had something else than Gfx.Present drag on long into the following frame, that would indicate that it is doing too much work. It actually … Hold on a sec
In that first screenshot in your latest post I can see the Camera.Render sample at the end of the frame taking 33ms, so that long sample that occurs in parallel to the Semaphore.WaitForSignal is likely not Gfx.Present but Camera.Render. So yeah, please recheck what that long sample on the Render thread is.
Just to clarify something: Deep Profiling support is not connected to GPU profiling support in any way. You can make builds with Deep Profiling Support and the turn of DeepProfiling in the Profiler’s toolbar to include less overhead for deep profiling scripting samples (the blue ones) which makes a noticable change in your screenshots.
But the GPU is profiled when the GPU Module is added to the Profiler and the current hardware and project settings support it, deep profiling or not. Removing the module stops GPU profiling and removes the GPU profiling overhead with it too. Good to see it didn’t change anything with the batching but from that Frame Debugger screenshot and your project screenshot it doesn’t look like you’re doing too much batching anyways.
So looking into reducing your Drawcall count and how the lights, deferred rendering and shadows might play into that could be valuable to finding ways to improve your rendering times.
I also did not change the vSync settings so I guess default is don’t sync.
When being in the lobby (just UI overlay and nothing much happening) it’s actually worse, see screenshot:
There is no overlapping in the view. When zoomed out it looks like camera is overlapping but when zooming in it’s actually separate. Camera.Render is at 2.2ms.
Now trying to disable/enable stuff to see if/when things change, the profiler itself does not give me any clue…
Found it! It looks like it is the rendering of the 6 point lights I have. When I turn them off, I’m at 100 FPS.
Do you have a suggestion how I can keep illuminating my scene with at least 6 lights but have a better performance?
Please try actually setting this and read the manual section again. The default is -1 which on most mobile phones will default to 30 to conserve battery power. To be sure to get 60, you need to set it to 60, ideally only on phones that can actually handle that many frames, but for testing purposes we won’t need to worry about that for now.
Please double check this too and try other settings, you don’t even need to do it from code, you can do so via the Quality Settings under Other (If you do this vs via script, just make sure the Quality level you are editing is the default one for Android. Check the matrix at the top of the Quality Settings.). I just checked the Quality Settings on some freshly created projects, it looks like the default is Every V Blank. This would then actually override the targetFrameRate but I’d still try setting vSync to Don’t V Sync and targetFrameRate = 60 explicitly to check if that changes anything in your case.
Re Deferred Mode
In the Graphics Settings in the Tier Settings. Here too make sure that you’re actually looking at the tiers for Android, and if they are different between each other, check Graphics.activeTier to know which one is used on your phone.
I didn’t mean Camera.Render on the Main thread, but on the Render Thread. Here I’ve highlighted the 33ms sample overlapping into the frame following the one selected as Broken-Shader-Pink. The yellow highlight shows Rendering samples overlapping into the selected frame from the previous one. The middle highlight shows what I initially presumed to be a Gfx.Present sample. What ever that is, it can be critical what the time distance is between the last Child Sample of Gfx.Present ending in one frame, to it ending in the next. If that interval happens to be too tight a fit for the rate of v Blanks (i.e. bigger than 16.66ms, 33.33ms, …) and get a longer wait after that, chances are you missed the vBlank and need to wait for the next one, or even the second next one, depending on your vSync settings.
There are several sections in the Manual and Learn sections that can probably explain this better than I can. Try for example this one, this one or this one.
6 per pixel point lights on a 4 year old phone is an unreasonable expectation. Really, anything above maybe 2-3 lights is probably the limit. And deferred doesn’t really help since phone GPUs are extremely bandwidth limited and deferred rendering is trading memory bandwidth for performance… which doesn’t work if you don’t have any memory bandwidth.
The way you’d do something like what you have above is … by faking as much as you can. You might be able to get away with a shiny cubemap for a fake reflection and some non-important lights for the main lighting, and maybe one or two per pixel “important” point lights. That might get you to 30 fps.
Okay so here is the summary of my investigation and what I did about it:
I was able to identify 3 main problems:
I was using the method PlayClipAtPoint on pickup which led to spikes in performance. (minor inpact)
Fix: this was easy to fix, just used audio.PlayOneShotinstead. I put it on a global emitter. Sure it’s not 3D audio but the performance is way better and you don’t need 3D audio on mobile. I guess you can put the PlayOneShot on the pickup item to keep 3D audio.
The post-process layer that comes by default with unity was affecting my game dramatically. (high inpact)
Fix: I ended up using a paid 3rd party stack for the effects I was using. The quality is way lower than the ones that come packaged with unity for free but the performance is way better.
Point lights are too heavy for mobile. (highest inpact)
(this is what was figured out thanks to this thread)
Fix: I still did not find any fix for that. I guess bgolus is probably right when he sais that it just won’t work on an old phone. So I will be adding graphic settings to the game where users can choose for themselves. Low settings will just disable/remove the point lights