I’m trying to figure out the cause between intermittent frame stutters in our game. When profiling, on average, everything happens in <3ms on the CPU with a Gfx.PresentFrame at ~9ms on the GPU (Standalone Mac build), except on occasion, the GPU kicks over to the next frame with a PresentFrame of 30+ms.
1: Normal frame, CPU finishes in 1.5ms, PresentFrame takes 9.5
2: CPU takes under 2ms, PresentFrame explodes to 25.8ms
3: Because of the previous frame, CPU needs to wait for ~10ms before continuing. PresentFrame takes another 32ms
4: CPU needs to wait nearly 27ms for GPU to finish
This only seems to happen intermittently - FPS will be normal for a second, then PresentFrame will stall for 10 frames, then back to normal for 20 frames, then PresentFrame will kick in for 4, etc
I’m somewhat at a loss to figuring out what’s causing this.
Some points:
Standalone player is 488x864
Application.targetFrameRate is 60 (equates to VSync every frame, as it’s enforced on certain platforms)
Game is 2D, static map taking up the screen (drawn via RenderImage on a canvas), with some UI on top. Total draw calls are ~30, with ~1k tris for certain 3D models
There are no GC allocations during the phases that this happens. The only memory allocations that arrive through play are from network messages (~1 every 2-3 seconds)
There’s no culling as all objects are visible on screen
No Game or Scene tabs are open when profiling
There are no shadows, normals, reflections, HDR, etc
What I’ve tried so far, which hasn’t had an effect:
Turning off VSync
Setting all texture resolution to 1/8 quality (to check if it’s an issue with memory bandwidth)
Removed all UI and canvases except the map, draw calls down to 5
Removed all 3D models
Removed all particles
Set all shaders to Mobile unlit
Decreased the display resolution to half the size (244x432)
Enabled experimental graphics jobs
Disabled Metal support to go to OpenGLCore
Changed from forward rendering to deferred to legacy (currently using forward)
Keeping loaded shaders alive
I understand that PresentFrame is a sort of a catch all to showing the image on the screen, but I don’t know why it would jump 3-4x at random intervals for random durations.
Any ideas/suggestions to try would be very welcome, or if any Unity employee could explain a little what goes on inside the method so I could try to figure out what’s triggering it.
Hi!
Can you please try setting the Application.targetFrameRate to -1 and check if the behaviour is still the same?
Also, which MacOS version are you on?
I would suggest setting it to 0 to make sure that simulation frame rate doesn’t interfere with VSync.
Also on OSX it would be best to use Instruments to dig into what’s actually happening on GPU. It doesn’t seem that builtin profiler can provide enough information to understand the problem. Present means many things - process uploaded gpu commands, wait for vsync…
Another issue I see is that there is unnaconted time in profiler which I can’t correlate with what we do there (looking at the code). There are cases when GC.Collect kicks in on another thread and pauses all other threads - might be a that.
To find out what’s in there I suggest to do sampling profiling with Instruments.
Thanks for replying - sorry for the late reply on my part; I got pulled away on other tasks, and I’ve not used Instruments before, so it’s taking a while to get into it.
Doing a sample with Instructions, one of the things that stood out is that the CPU is very erratic - going from 0 to 40 to 0 to 120 etc
Doing a Time Profile, and for specific time sequences, the Job Queue is taking up 50% or more of the CPU (disclaimer: I’m far from an expert in Instruments, so I could be reading this wrong)
I understand Unity breaks up a lot of the internals into Job-based stuff, but any ideas on what could be taking so long. The game itself doesn’t use the Job system, so whatever’s in there would be all Unity.
Is there a way to disable the Job system so everything runs on the main thread? (to check if there’s an issue with locks/semaphores)? In the main PlayerLoop, every time something is added to the queue using JobQueue::EnqueueAll(), it’s taking 1ms
This is a project that’s been brought through a lot of different Unity versions, all the way from Unity 4, so it wouldn’t surprise me if it was something like slow optimisations in order to retain backwards compatibility.
Could you please check kernel/system checkbox to see kernel data? Sometimes a lot of time is spend in waiting for semaphore. And taking into account self time (time - sum of child times) I suspect that JobQueue::ProcessJobs is just sitting in semaphore most of the time.
Also what is the behavior with Application.targetFrameRate set to -1?
I’m assuming this means that something’s not freeing the semaphore as it should?
In most instances that I could find the underlying job had to do with the Animator. We use the animator for our buttons, though there are only 3 active buttons in this scene. Does Unity use Animator internally?
When setting Application.targetFrameRate to -1, the game is pretty much unplayable - pause, fluid, pause, fluid. Like watching stop motion
Is it possible to disable the Jobs system to test?
This is expected behavior - if there are no jobs to process worker threads sit idle in a semaphore. That means that the thread doesn’t do anything and waits for system to wake it up.
No, it is on on all platforms except WebGL, and autoconfigured based on CPU cores count. I don’t think that would help - it doesn’t look like the game is CPU bound.
Do you have VSync on? VSync should stall main thread.
I think in order to diagnose problem further it is important to see what’s happening during Gfx.PresentFrame on GPU. It slips through 1 vsync according to sample time. And GPU capture might be helpful.
You know, I’m not going to lie; I thought that Application.targetFrameRate was the “code” way of controlling VSync, so I had it disabled in Quality Settings. Reading up on it though, it’s not case, as once VSync was set, the standalone performance is a lot smoother. Basically going from this
We’re moving to 2018.2 and the lightweight render pipeline, which should lift off even more load from the CPU/GPU, so it should improve.
That said, the editor still lags similar to what’s shown in the first post: everything’s fine, then bit hits with Gfx.WaitForPresent/PresentFrame. Even when nothing is moving on the screen, or nothing is being rendered (I turned off the camera)
Is there something special in relation to the editor and VSync/Application.targetFrameRate? It doesn’t seem to respect it at all:
(On a separate note: can you recommend a good resource in getting up to speed with Instruments & Metal capture? I’m feeling somewhat out of my depth here - I don’t know how to see what’s happening inside PresentFrame for a given frame)
Application.targetFrameRate is a “soft” vsync and is useful to limit simulation rate without hardware vsync set (e.g. on servers). It is not desirable to use targegFrameRate with vSyncCount set, as CPU frame might drift and cause rendercommands to be scheduled later and miss next vsync.
Yes, Editor doesn’t use gfxdevice vsync for views afaik to avoid waiting for each window being vsync’ed.