My game is suffering from stuttering (sometimes 2+ seconds paused on a single frame) and general performance degradation over time. A memory leak would seem like the obvious issue, but even after 6 hours stress-testing the more intense garbage-heavy components, the footprint hasn’t grown at all, according to the memory profiler.
Through deep profiling I have fixed a few things that are within my control, but I’m still seeing nasty stuttering every 5-20 seconds - but not until playing for at least 1-3 hours. Prior to that everything runs smoothly.
The thing that stumps me most are the things that the profiler claims are actually taking the time.
Some examples-
– ParticleSystem.Update2 (e.g. 2200ms)
– a couple of minutes later, a bunch of GfxDevice.D3D11.WaitForLastPresents taking 300-400ms each
– and then a single call to Animation.Sample taking 400ms a minute later
– raycasting likes to join the party, e.g. one single raycast took 320ms and then shortly afterwards one single SphereCastNonAlloc took 40ms
These causes of stuttering seem to be low-level Unity functions, so what can I do?
Any advice welcome
Have you considered that perhaps your computer may be running hot and CPU/GPU starting to throttle occassionally? That would coincide with the randomness of what is taking extra time.
You could run Furmark+Prime95 together for a couple hours and observe their behaviour, specifically Furmark has a running graph where you’d see any heat-related throttling as troughs/dips in the graph, specifically GPU or memory frequency. You’d also learn if the system is generally stable if it won’t crash or glitch.
You could also run the game on a different machine to see if the behaviour is the same.
Try updating the graphics driver in case there’s a bug, and perhaps generally drivers for the motherboard too.
Hey, thanks for the reply. I should have pointed out, this isn’t happening on one specific machine but (as far as I know) to everyone. The game has been in beta for a few months and many of my testers are reporting it. It’s now in NextFest and I’m getting lots of messages about it from those players that stick with the game for long enough to see it.
In that case, another common source of such stutter can be garbage collection. Perhaps after a long enough time the amount of garbage that needs to be recycled simply grows, without actually leaking memory.
For instance if you have a list of strings that you concat, doing so would generate garbage. And the longer that list becomes, the more garbage it will create.
Since you tagged 2020.3, are you at least on the latest patch release of that version?
And if that version already has the incremental garbage collector you should try and turn that on.
It may be worthwhile to take this opportunity and try upgrading to 2021.3, then 2022.3 and maybe even Unity 6 - depending on how much of a headache those upgrades pose.
Thanks. No, it’s not GC either. The profiler would show if it was. Already using incremental.
Yeah, I’ve considered upgrading Unity version but 2020 was as far as it wanted to go without throwing a ton of errors. I’m on 2020.3.26 which was the latest LTS at the time. I see there have been a few newer releases since… I might try that, seems like a long shot though.
Yeah, I’ve spent a lot of time in the profiler, thanks (and I did fix a few things based on it). What’s left is just random-looking low-level stuff like a single raycast taking half a second, or 10 calls to Vector3.get_zero() taking 1330ms. It seems like something is freezing it up, and the profiler is blaming whatever happened to be executing at the time.
Are you profiling (release) builds? In the editor you’re likely to get spikes from the EditorLoop.
Random things taking unusually long makes me think multithreading (locks), interrupts and drivers. And by extension, the OS itself. Perhaps you can find some commonalities of tester systems such as the same OS version, graphics driver version, antivirus or CPU/GPU model.
When you use the profiler those 2,000 ms are basically extrapolated and may not be more than 20-200 ms without the profiler, so some sort of interrupt handling or driver stall and recovery might be a possible culprit.
The question is: are there systems where this issue NEVER occurs? If so, it can’t be a general game code problem but more likely it’s related to the environment.
Random things taking unusually long makes me think multithreading (locks), interrupts and drivers. And by extension, the OS itself. Perhaps you can find some commonalities of tester systems such as the same OS version, graphics driver version, antivirus or CPU/GPU model.
Yeah, that makes sense. Although on the other hand, if this were a wider system/Unitty issue and not directly related to my own code then surely it would be commonly complained about.
The question is: are there systems where this issue NEVER occurs? If so, it can’t be a general game code problem but more likely it’s related to the environment.
I haven’t seen any systems yet where this definitely doesn’t happen. I’m currently stress testing on my main desktop PC which I haven’t used for this purpose before. After 2 hours of randomly running about and tearing through dialogue, I’ll see if there’s any stutters.
So I think I have it solved. Despite the profiler not mentioning GC, the problem does seem to be GC related. Specifically: some kind of problem with incremental (as suggested by zulo). My diagnosis at this time is that running incremental GC for an extended period, with reasonably large amounts of garbage to deal with, will eventually lead to overall framerate slowdowns and stutters, even at times when there’s no garbage to collect.
I didn’t want to revert to uncontrolled stop-the-world GC as that’s horrible, so what I have done now (and it seems good after 4+ hours of stress testing with loads of garbage and GPU activity) - during the menu I have GC on automatic, during normal play I have GC disabled, and during chat (which is when unavoidable garbage happens) I run a 1ms slice of incremental per frame, for 13 seconds max per opening of the chat window. I also do a full garbage collect on each save/load. I’m trying to minimise the amount of time that incremental runs for, as it really does look like it’s the root issue here. At the same time, of course, I don’t want overall memory footprint to grow.
Oh. Now that you mention it … the Incremental GC has some caveats mentioned here at the bottom as to why you might want to disable it:
This general caveat still applies in 6.1:
Given these reasons perhaps you’ll get an idea as to why this might trigger only in long running sessions. I’m guessing that the number of instances that are unfavorable for Incremental GC grow over time until they reach a point where they cause a framerate to drop. If that helps narrow down the cause, it might be either fixable or even point to a generally undesirable behaviour (aka bug).