GPU Driven Rendering In Unity

Hi everyone!

For quite some time we have been working on some deep performance improvements to how we do rendering and batching of draws for Unity. These improvements are designed out of the box to ‘just work’ with projects you have already created with URP and HDRP. We would like to share this work with you now so that we can get some feedback.. We want it to be rock solid and work for all platforms that are capable of handling this improvement. What is described in this post is available in 2023.3a8

Background
About a year ago we introduced the reworked BatchRendererGroup API for Unity 2022.1, and while this API allows for some great performance benefits you either need to use Entities Graphics or do a lot of custom coding to put your objects into this API. We thought this was not good enough and we want many more Unity projects to be able to benefit from faster batching and performance without needing to be modified. So we decided to write a system to make this possible…


Garden Scene Running with the GPU Resident Drawer.

GPU Resident Drawer
In the latest 2023.3 alpha we have landed a new rendering system which is called the GPU Resident Drawer. This is a ‘behind the curtain’, GPU driven, system that allows you to author your game using game objects and when processed they will be ingested and rendered via a special fast path that handles better instancing. The improvements you will see using this feature are dependent on the scale of your scenes and the amount of instancing you utilize. The more instanceable objects you render the larger the benefits you will see. This feature is specifically for standard MeshRenderes. It will not handle skinned mesh renderers, VFX Graphs, particle systems or similar effects renderers.

How to enable the GPU Resident Drawer
The system can be enabled within the HDRP or URP Render Pipeline Asset. You should find the option GPU Resident Drawer Mode. Selecting Instanced Drawing enables the feature, and you can also select if you want the feature to be enabled just in play mode or in edit mode as well.

URP:

HDRP:

Some specific settings also need to be set, there are UI affordances that will tell you if your project is not configured properly, the specifics are:

  • BatchRendererGroup variant stripping needs to be set to Keep All, otherwise stand alone player builds will not render the converted objects.

  • In URP you must be in Forward+ rendering mode.

  • Static batching should be turned off. This is not required, but with static batching off instancing will do a better job, which results in fewer draw calls.

  • Under Lightmapping Settings in Lighting Settings check Fixed Lightmap Size and uncheck Use Mipmap Limits, this is also not required but will also result in fewer draw calls.

The system supports dynamic changes to game objects - the conversion runs incrementally and will pick up newly created objects as well as changed objects. This happens once per frame after LateUpdate but before Rendering begins - this means that if you are moving objects during rendering (for example in the RenderPipelineManager.beginCameraRendering callback) they may have incorrect data when rendering happens. You will want to force these objects to NOT be rendered via the GPU Resident Drawer using the “DisallowGPUDrivenRendering” MonoBehaviour.

Finally, the system is also compatible with Umbra occlusion culling so if you are already using that in your projects you will continue to see that benefit.

Objects rendered via the GPU Resident Drawer show up in the frame debugger as ‘Hybrid Batch Group’. In the spaceship scene the full g-buffer is laid down using the GPU Resident Drawer path.

Feature Support
The GPU resident drawer is supported on the modern rendering backends within Unity - specifically anywhere compute shaders are enabled this functionality should work. When your project is running on a platform that does not meet the required hardware capabilities the rendering will fall back to the traditional, non GPU, pathway.

One further specific note: OpenGL and GLES are explicitly not supported. The GPU Resident Drawer will fall back to regular game object rendering on these rendering backends even if they support compute shaders.

When the feature is enabled some objects may still not render via the new path, in these cases they will draw via the regular rendering paths.

Specific cases that are not compatible:

  • The light probe usage on the renderer is set to use proxy volume
  • The renderer is affecting or is affected by real time global illumination
  • The renderer has a MaterialPropertyBlock attached
  • The shader used by the material is incompatible with DOTS Instancing
  • The renderer has per instance rendering callbacks attached (OnRenderObject etc)
  • The Gameobject has the DisallowGPUDrivenRendering component attached.

Compatibility Notes
Not all objects can be rendered using the GPU Resident drawer and you may need to manually mark some objects to not render via this path.

The situations we are aware of where you might need to do this:

  • You are using a ‘custom pass’ in URP and that custom pass does not support the dots keyword but the main material does. This can not be detected by the system and the custom pass will fail to render.

  • You are updating the transform on a per camera rendering basis. We update the objects in the GPU cache one time per frame right before the Unity rendering pipeline is executed. It is not recommended to update object positions while the render pipeline is activated (i.e. per camera) but if you must do this then you will need to mark the objects to not go via the GPU pipeline.

To force a GameObject to render via the GameObject path instead of the GPU resident drawer, add the new DisallowGPUDrivenRendering component to it.If you need to use this script for situations outside of those listed above please let us know why so we can improve the system or documentation.

Performance
The GPU Resident drawer is specifically a CPU time optimization and may change GPU performance characteristics; please read to the end of this section to understand more.

How much CPU time is gained varies depending on the content that is rendered. Specifically content with more instancing and similar will benefit more as less draw calls will need to be submitted to the GPU. From our testing we have seen some larger scenes benefit massively, halving the CPU frame time. Smaller scenes also tend to benefit but often only show marginal improvements.

Here we have some numbers from an internal test project, your scenes may differ so please profile on your own projects to be sure.

On the project in the Editor running on Metal we go from about 15ms main thread rendering and 31ms render thread time with the regular Game Objects path

CPU Time Improvements
9392939--1314245--Screenshot 2023-10-06 at 11.35.56.png

GPU performance notes
GPU performance may be negatively affected by drawing using the DOTS Instancing variant and this will be different depending on the device the content is rendering on. This is due to how data is loaded by the shaders which is different when using this feature. This effect will be more prominent on lower powered mobile GPUs but in many cases is also offset by the reduced number of draw calls. We would love to hear your feedback on the performance when using this feature on the projects that you are developing.

Additional Notes

  • Culling might differ slightly as the culling code has been reimplemented in C#. If you notice issues here please report them.

  • Setting BatchRendererGroup variant stripping to Keep All will increase shader variant count for player builds. This means that you may have longer build times.

  • Lightmaps are handled differently when this code path is enabled, we use TextureArrays with a dynamic index in the shader to look up the lightmap to use. This will lead to increased GPU memory use for lightmaps when this feature is enabled. We are investigating how to improve this.

How Can I Provide Feedback
The best way to provide feedback is in this thread. Try this feature on your projects and report and issues or performance numbers here. If there are any bugs encountered or similar we’ll likely ask for a bug report but feel free to post here first.

49 Likes

tried in simple scene, maybe did something wrong?

default URP
9393041--1314254--upload_2023-10-6_14-19-55.png
,

GPU Resident Drawer (with recommended settings)
9393041--1314257--upload_2023-10-6_14-20-32.png

1 Like

Can you share the scene or details? Seems like you are hitting a path where you can save like 17k draw calls (massive amount) with dynamic or static batching. Static batching still works with GPU driven so maybe using a combination of both would be good in your use case?

Either way would love the scene to assess where the delta is in your use case.

made some more tests, i think srp batcher was not enabled earlier.

9393101--1314269--upload_2023-10-6_14-50-16.png

9393101--1314272--upload_2023-10-6_14-50-35.png

** best results **

9393101--1314275--upload_2023-10-6_14-50-47.png

9393101--1314278--upload_2023-10-6_14-50-56.png

private cad model, so cannot share.. but if i get similar results on public models, i'll post.
9393101--1314281--upload_2023-10-6_14-52-58.png

Ahhhhhh ohh yeah.

This system in general improves the performance of rendering (i.e comparing the exact same set up off vs on) but you will only really see big benefits when there is lots of instancing (kitbash designed worlds work well for this).

From the post:
[quote]
. This is a ‘behind the curtain’, GPU driven, system that allows you to author your game using game objects and when processed they will be ingested and rendered via a special fast path that handles better instancing. The improvements you will see using this feature are dependent on the scale of your scenes and the amount of instancing you utilize. The more instanceable objects you render the larger the benefits you will see.
[/quote]

Cad models are a case where there is generally no instancing and a lot of unique meshes... this means they are still going to be individual draw calls per mesh. Static batching here will generally always be a good path as it will merge small meshes and reduce the amount of draws to the GPU.

Resident drawer + static batching is good for your case as you get the benefits of the mesh combination as well as the faster backend drawing. So it's great to see that. In general the CAD model type cases are not the primary beneficiary of speedups here. We'll hopefully have something more for this case in the future ;)

7 Likes

Specific cases that are not compatible:

  • The renderer is affecting or is affected by real time global illumination

As in Enlighten, or APV?

Enlighten will not work. APV should work and we did validation on this.

2 Likes


Ah gotcha, so APV are fine then

This is really exciting! Haven't tested yet, but the initial numbers sound amazing. I noticed the github commit for this change was called "GPU Driven Rendering - Tier 0" Does that imply more is in the works?

CPU performance has always been a bit of a pain in the SRPs, but this, and the upcoming CoreCLR migration is going to do wonders. Only thing I'm worried about is the increased shader variants, and more specifically, how much slower build times are going to be. (Unity needs a major shader compilation rework with the way it's going)

This is very exciting! Good stuff.

Given that this is "tier 0" and much of groundwork for being "more fully GPU driven", I now hope this will move to, well, being more GPU driven - GPU culling, GPU occlusion culling, meshlets, you know all that jazz. Eventually. Still, very good foundational piece!

Nice work everyone.

27 Likes

@Tim-C did you guys manage to test this with HDRP terrain demo project? i didn't get any performance improvement with it

If you have fallbacks built in - why not just have it enabled by default, can't imagine most people will even be aware of this feature if its buried in the settings by a checkbox.

9394235--1314527--KeepALl.PNG

When we have confidence that it’s working properly for people we will do this. There are some thing though called out in the post that make it a bit difficult to turn it on always:

  • More shader variants
  • Potentially slower on GPU for lower tier mobiles

We want to be careful. Likely it will become always on in HDRP.

This project is using terrain instances and speed trees rather than mesh renderers. This is currently not supported via the fast path but is something we are working to add.

We will be continuing to improve this area of Unity :wink:

Thanks Aras, hope you are well! We’re hoping to have a number of the things you describe added to the rendering layer.

6 Likes

Super interesting development! We've been investigating developing this using the Graphics.RenderMeshIndirect api, but its great to see this builtin. Ill definitely try to share some of our results.

Reading the commit (https://github.com/Unity-Technologies/Graphics/commit/e23606ac43245be74c2607dd1dd58aab21fc03ad?diff=unified) im a bit confused. Is it the case that now GameObject rendering can use BatchRendererGroup, and secondly that brg can be done gpu driven/indirect?
Does this mean dots rendering gets this too of the box?

Cant wait to learn what the other tiers are too!

1 Like


Ahh so i just test it on the worst case project then , gotcha


Give it a go with the hdrp spaceship project.

@Tim-C Sounds great and thanks for article.

Just a few questions:
- Is this some kind of ECS approach? (something-something converting gameobjects)
- What the difference between DrawMeshInstanced/Indirect?
- If the best cases for ECS something dynamic, Resident Drawer for only static meshes, right? Then quite trivial question, how it will work with large forest landscapes (not 2-3 different types of trees, but 20-50)

I'm HDRP main, I tested various scenarios with trees mainly, I'd say even with usual gameobject it holds really well, obviously it skyrockets with ECS objects

1 Like

~~The errors I got from this setting don't log to the console, only the log seemingly.~~

It mentions I've got a warning in the bottom bar of the editor:
9394922--1314710--image.png
The aforementioned warning not being visible in the Console
9394922--1314722--image (1).png

EDIT: I'm having this problem with all unity errors in this version, doesn't relate to this change. Please disregard.


We run a conversion process that captures the gameobject data and converts it to draws in the batch render group. It’s persistent between frames amd only executes when things change. It’s not ecs based.

[quote]
- What the difference between DrawMeshInstanced/Indirect?
[/quote]

instanced issues draws, instances indirect issues draws that can be modified with a compute shader before being executed by the gpu. so you can essentially modify on the gpu what you render.

[quote]
- If the best cases for ECS something dynamic, Resident Drawer for only static meshes, right? Then quite trivial question, how it will work with large forest landscapes (not 2-3 different types of trees, but 20-50)
[/quote]
Each tree type will be an instanced draw. Speed tree and terrain support is still coming to the system though. If they are standard mesh renderers it will be fine though.

1 Like