Incremental GC feedback thread

Unity 19.1a10 has experimental support for incremental garbage collection. You can find more information about the feature in this blog post.

I'm opening this forum thread as a place to discuss the feature and to collect any feedback. We are very interested in hearing from anyone trying this on projects (especially projects which are suffering from GC spikes), and to hear how incremental GC affects these projects - but any other type of feedback is very welcome as well of course.

11 Likes

Looks really interesting, @jonas-echterhoff_1 !

Thanks for letting us try it at such early stage.

Here is a first simple experiment on Android:

3930754--335809--upload_2018-11-27_2-15-5.png

3930754--335812--upload_2018-11-27_2-16-4.png

3930754--335815--upload_2018-11-27_2-20-46.png

Last screenshot reveals nature of the Incremental GC: Profiler shows GarbageCollector.CollectIncremental taking all the WaitForTargetFPS frame (wait for vsync) and GC.Collect runs portion of job within CollectIncremental time.

And if I got it correctly, this picture is totally correct - Incremental GC makes some job to define borders, then just runs chunk of synchronous GC.Collect() at the specified frame and then makes some more additional work to prepare for the next frame.

And this is much, much better than a single 9ms spike with Incremental GC turned off for same scene:

3930754--335818--upload_2018-11-27_2-28-30.png

I'm really happy to see this is coming and will be available at the 19.1.

Though I'm afraid this will relax requirements for the developers on heap allocations avoidance and it may increase ignorance to the GC allocations problem, leading to more issues with GC in the future on the late project stages =D

3 Likes

[quote=“jonas-echterhoff_1”, post:1, topic: 722832]
Unity 19.1a10 has experimental support for incremental garbage collection.
[/quote]

We see significant performance problems in any managed code that allocates memory, independent of garbage collection spikes - code that allocates just runs more slowly. My theory is that the Boehm GC approach means fresh allocations constantly spill into fresh cache lines, so code that allocates will almost always be hit with a performance-crippling cache miss.

I had hoped that the rumoured “new garbage collector” would be a generational garbage collector with good cache utilization for short-lived allocations. Is there an initiative at Unity to support generational GC, or is incremental Boehm the best we can hope for? Reducing spikes is great, but if allocation continues to hurt performance then we will continue to avoid allocations as much as humanly possible.

1 Like

[quote=“yoyobbi”, post:3, topic: 722832]
We see significant performance problems in any managed code that allocates memory, independent of garbage collection spikes - code that allocates just runs more slowly. My theory is that the Boehm GC approach means fresh allocations constantly spill into fresh cache lines, so code that allocates will almost always be hit with a performance-crippling cache miss.

I had hoped that the rumoured “new garbage collector” would be a generational garbage collector with good cache utilization for short-lived allocations. Is there an initiative at Unity to support generational GC, or is incremental Boehm the best we can hope for? Reducing spikes is great, but if allocation continues to hurt performance then we will continue to avoid allocations as much as humanly possible.
[/quote]
This would be the benefits of percise GC, Bohem is a conservative collector, which means it cannot tell the difference between real pointer and a integer value. So compacting memory is not possible with boehm, as well as generational marking. Percise GC(both sgen, coreclr’s gc, jvm’s gc) will compact memory, which means to move live objects together in order to eliminate memory fragments and to improve cache localty.
But currently it’s most unlikely unity will adopt any percise GC, because non of those work with il2cpp. It’s difficult to get the stackmap out of c++ compiler which is crucial for percise GC.
Using percise GC at this point would mean to abandon il2cpp and switch to JIT generate code gen system, like mono aot or coreRT. CoreRT is currently not production ready and don‘t support iOS

1 Like

[quote=“codestage”, post:2, topic: 722832]
Looks really interesting, @jonas-echterhoff_1 !

Thanks for letting us try it at such early stage.

Last screenshot reveals nature of the Incremental GC: Profiler shows GarbageCollector.CollectIncremental taking all the WaitForTargetFPS frame (wait for vsync) and GC.Collect runs portion of job within CollectIncremental time.

And if I got it correctly, this picture is totally correct - Incremental GC makes some job to define borders, then just runs chunk of synchronous GC.Collect() at the specified frame and then makes some more additional work to prepare for the next frame.
[/quote]

Thanks for the testing! From your screenshots, it looks like you don’t actually have vsync enabled, though, making the player run at >100fps? If you enable vsync, the GC should have a better clue at how much time it should use. If you don’t, try changing the value of GarbageCollector.incrementalTimeSliceNanoseconds.

Yes, this is a concern I share - people might make up for the better time distribution by writing less optimal code, and then not benefit in the end. Though you could argue that there is still benefit, if you can get to a similar result with less hard optimization work.

1 Like

[quote=“yoyobbi”, post:3, topic: 722832]
I had hoped that the rumoured “new garbage collector” would be a generational garbage collector with good cache utilization for short-lived allocations. Is there an initiative at Unity to support generational GC, or is incremental Boehm the best we can hope for? Reducing spikes is great, but if allocation continues to hurt performance then we will continue to avoid allocations as much as humanly possible.
[/quote]

Right now, no. But as I wrote in the linked blog post, incremental Boehm seemed like the smallest (and thus, safest) step to take towards a better GC, and should help solve the biggest problem people seem to have (spikes). Once this is shipping and stable, we are at a better point to switch to other GC solutions, as the write barrier part needed by pretty much any modern GC is solved then. We will continue to listen to feedback and consider future steps based on that.

That said, no possible solution is a silver bullet. Unity’s requirements don’t necessarily match that of other software, so what works well somewhere else might not work well for Unity. Eg, users have repeatedly asked about switching to Sgen, which I have been testing with, and did not get overall better performance results in Unity content.

4 Likes

Thanks for your reply, Jonas!

[quote=“jonas-echterhoff_1”, post:5, topic: 722832]
From your screenshots, it looks like you don’t actually have vsync enabled
[/quote]

It actually was built with Every V Blank setting:

3931951--335983--upload_2018-11-27_11-40-15.png

Though I agree CPU graph looks unusual for the Player with VSync enabled.

What's the logic for sweeping with this? Does it have anything resembling generations or other knobs we can tweak?

[quote=“snacktime”, post:8, topic: 722832]
What’s the logic for sweeping with this? Does it have anything resembling generations or other knobs we can tweak?
[/quote]
The only knob to tweak is the maximum time spent on scanning per frame.
No big logic changes and no generational GC yet.

As jonas-echterhoff explained in post#6 you can view it as a sort of preparation stage for coming changes that also already fixes the biggest issue we have with the GC (which is frame time spikes).

1 Like

[quote=“codestage”, post:7, topic: 722832]
Thanks for your reply, Jonas!

It actually was built with Every V Blank setting:

Though I agree CPU graph looks unusual for the Player with VSync enabled.
[/quote]

I think the profiler graph may be wrong here. Looking at the reported total frame time of ~42ms, that does not match the graphed frame rate between 100-200 fps. I think there were some bugs in profiler graph rendering in 19.1, I’ll check with our profiler developers.

1 Like

[quote=“dadude123”, post:9, topic: 722832]
As jonas-echterhoff explained in post#6 you can view it as a sort of preparation stage for coming changes that also already fixes the biggest issue we have with the GC (which is frame time spikes).
[/quote]

Just to make sure I’m not overpromising: There are no specific “coming changes” planned after incremental GC. GC spikes are clearly the biggest user issue with GC today, so we are setting out to fix those. Once that has landed and is out of experimental, we will listen to feedback and evaluate what are the most pressing issues to work on, and plan further steps based on that.

1 Like

@jonas-echterhoff_1 thanks!

Yeah, total 42ms appear strange to me too.

1 Like

[quote=“codestage”, post:12, topic: 722832]
@jonas-echterhoff_1 thanks!

Yeah, total 42ms appear strange to me too.
[/quote]

FYI, this is the bug in question: https://issuetracker.unity3d.com/issues/profiler-data-does-not-match-the-numeric-data-in-its-hierarchy

2 Likes

[quote=“jonas-echterhoff_1”, post:11, topic: 722832]
Just to make sure I’m not overpromising: There are no specific “coming changes” planned after incremental GC.
[/quote]

Thanks for clarifying. Spike reduction is definitely a great step forward, so thank you for that.

We will continue to avoid allocating memory in order to maintain decent cache performance. I guess the good news is that all the tricks we’ve learned and pooling mechanisms we’ve built aren’t about to become obsolete after all. :slight_smile:

In future with ECS + jobs + Burst compilation - all premised on native arrays of value types - we should be writing more cache-friendly code with less allocation.

[quote=“yoyobbi”, post:14, topic: 722832]
I guess the good news is that all the tricks we’ve learned and pooling mechanisms we’ve built aren’t about to become obsolete after all.
[/quote]
In a managed environment they will never become obsolete even with generational GC. Even if Unity someday will get a modern GC, you still have to pool almost everything.

1 Like

Any ideas why enabling Incremental GC doesn't seem to be doing anything? Even in a brand new project on 2019.1.0a11, with the only changes being setting Scripting Runtime to 4.x and enabling Incremental GC in Player Settings, with a simple test script, I'm still seeing GC being run as a single frame, and without the GarbageCollector.Incremental call in my profiler.

[quote=“KillHour”, post:16, topic: 722832]
Any ideas why enabling Incremental GC doesn’t seem to be doing anything? Even in a brand new project on 2019.1.0a11, with the only changes being setting Scripting Runtime to 4.x and enabling Incremental GC in Player Settings, with a simple test script, I’m still seeing GC being run as a single frame, and without the GarbageCollector.Incremental call in my profiler.
[/quote]

Which platform are you testing this on?

[quote=“jonas-echterhoff_1”, post:17, topic: 722832]
Which platform are you testing this on?
[/quote]

Windows.

[quote=“KillHour”, post:18, topic: 722832]
Windows.
[/quote]
Testing in editor or player? Incremental GC is only supported on players atm. Also, how long is your GC spike? If it is very short, there might not be a point in spreading it over multiple frames.

That explains it. I was testing in the editor.