Hi,
I’m 3 months away from releasing my first steam game and I need to improve performance. I improved a ton on the scripting side but the rendering is just taking too long.
RenderPipelineManager.DoRenderLoop_Internal() takes around 8-16ms per frame depending what the camera is looking at. I need to get it down to around 4-6ms to achieve my 60FPS target.
But I don’t have enough experience to know how to identify what are the issues with my rendering. I assume I need to reduce the number or draw calls or reduce the numer of vertices, but I don’t want to blindly try optimizing things without completely understanding it first.
To give a bit of context:
I have 2 cameras, one for the 3D background and another for the 2.5D foreground.
My Profiler
I have things like the SRPBatcher.Flush taking 2.04ms on its own for the background camera
My Frame Debugger
Both the Foreground and background camera have a ton of draw calls.
And my game view stats
I see that I have 631 draw calls, millions of Tris and thousands of batches. But I can’t tell what parts are the issue.
Is there any way I can know how many ‘ms’ It takes per draw call / draw group in the frame debugger? That would help me identify which parts are slow so that I can change them.
Could someone recommend some good learning material for this sort of thing?
I don’t know what version of Unity and URP you use, so some things may have different names on your end.
Some things I’ve noticed:
-
That’s a lot of SetPass calls
-
Something is probably breaking batching
-
Those books and scrolls that are copy pasted a hundred times?
-
You want to make sure these are rendered using GPU instancing and not batching
-
It’s the same use-case as terrain grass/trees: One object rendered a lot of times in different places
-
That’s what GPU instancing is there for
-
Is it perhaps the shadow maps (real time shadows) that are taking so long?
-
When using multiple cameras/render textures in URP, you may want to use a separate URP asset for those extra cameras
-
Do all of your cameras need to use all of the RenderFeatures like SSAO?
-
Have you toggled various features (e.g. shadows, post processing, extra cameras) while playing to figure out how much they cost?
Here are some additional pointers:
Use the frame debugger to debug batching
-
It will tell you why a new batch had to be started
-
You want to minimize the number of SRP batches
-
Reduce the number of shader variants you use
-
The material variants feature released in 2022.1/URP 14 is invaluable for that
-
It allows you to use a material as master material and then create variants from it, which act like prefabs
-
It makes it easy to make sure your game minimizes the amount of different shader variants used
-
I have 5 master materials and all environment materials (minus terrain) are variants of them
Inspect triangle density
-
You can either use the old wireframe draw modes, the new rendering debugger or RenderDoc
-
What you’re interested in is triangle density
-
Contrary to what many think it’s (often) not the number of triangles that are the issue
-
They can be, but modern GPUs can handle a lot of them
-
Skinny triangles and small triangles are inefficient on your GPU
-
The reason you want to LOD is not so much to reduce the number of triangles, but to keep triangle density in check
-
When a high poly model is far away, it gets smaller on screen
-
This means there will be tons of triangles contributing to the final color of a single pixel
-
RenderDoc has some extra visualization modes Unity doesn’t: “Quad Overdraw” and “Triangle Size”
-
RenderDoc is basically the grown-up version of Unity’s frame debugger
-
It’s free
-
The Unity editor comes with an integration
-
Triangle size shows exactly what I’ve been talking about
-
-
-
My example isn’t too bad (terrain should be reduced), but RD does help you to figure out where to look
-
Quad overdraw is also important for performance
-
-
Here we can see that the grass (especially) and the trees (less so) cause the same pixels to get redrawn over and over and over again
-
You want to minimize that
-
Use LODs
-
Use billboards
-
Look into impostor billboards, which are fancy billboards that work almost like magic
-
However, you don’t want too many LODs
-
Can cause more batches
Profile your shaders
-
You may want to figure out which mesh takes too long to render
-
You’ll have to use the native profiling tools of your GPU vendor for that
-
I have an AMD GPU, so I’d have to use the Radeon GPU Profiler
-
Bear some things in mind
-
DO NOT profile the editor
-
DO NOT profile development builds
-
Make a release build and create your capture in there
-
These are expert tools and the data they show you may be overwhelming
-
It’s still useful to know which objects take too long to render
Thank you @TheSniperFan
I’m using Unity 2020.3.30f and URP 10.8.1
- Looking at the frame debugger, it seems to more common reason for breaking batching is
SRP: First call from ScriptableRenderLoopJob
SRP: Node material requires device state change
SRP: Node use different shader keywords
One part I don’t understand is that sometimes the exact same material is not being batched in a row and the reason is "First call from ScriptableRenderLoopJob " which doesn’t make sense to me.
Here I have pass 179 and 180. Both are books, I can’t tell what’s different and why they are not toghether.
Note that this is a custom shader I made with shader graph and as you suggested I made enabled GPU instancing:
- Yes the books are the same prefab copy pasted, the shader gives them a random color depedning on position. So they all have a random color picked from a gradient (all done in the shader)
Enabling GPU Instancing or disabling it doesn’t seem to affect anything at all, so my assumpotion is that it can’t group them toghether.
But then I tried to do the same with the scrolls that use the URP/Lit shader and no change there either. So I’m not sure if I’m doing something wrong, or if URP or the frame debugger is acting up.
That’s possible, all of these objects can receive and cast shadows.
I have baked point lights in the scene too from the candles close to the books.
I removed the 3rd camera that wasn’t supposed to be there.
I have two cameras one for the background (Perspective) and another for the foreground (Orthographic).
The foreground camera is stacked on top of the background one:
I was not aware that each camera could use a different renderer asset. That looks like a good thing to look into to optimize each camera independently.
I’m using RenderFeatures for outlines, stencil buffer for explosion masking, etc… so definetly stuff I can remove for the background.
- Yes, There are a ton of things in the scene so there’s not a particular main problem. It’s a combination of many, many little problems.
Thank you so much for the links I’ll make sure to read through them thoroughly.
Framedebugger is life saving. I recommend to use from the start of the project rather than as final step. I was developing a VR application so I needed to be so optimized, I found out that baking light and deactivating some unwanted post processing had a very significant impact. See this one for example for impact difference