Guide to unity profiler: HDRP version (And how to read GPU & CPU profiler data)

Unity’s profiler is very intuitive and easy to understand, and yet very powerful.
GPU module is explained first, CPU second at the bottom.
To open the profiler, do this:
m8hkwm

The profiler has multiple modules, this thread will focus on CPU and GPU.
CPU is enabled by default, to enable the GPU module do this:
ubsh74

To view GPU profiler data, or CPU data, simply click on the chart of the module you want to view. In the above video you can see me clicking on the GPU module, which shows GPU performance information.

One important thing you need to know is that the editor comes with CPU overhead, which means the CPU information you see when profiling in editor will be worse than an actual build.

But, the GPU module has no overhead, what you see is exactly what you will get in a build.
To get more accurate CPU information, when needed, you can make a build with “development build” and “autoconnect profiler” ticked in the build window. Once it’s built, you can open your game and your profiler will auto connect to your build.

Worth mentioning: GPU profiling is not possible in a build when using “graphics jobs” (found in project settings > Player), graphics jobs remove a lot of the CPU rendering overhead into its own separate thread away from the main thread. This also only happens in build.

Now, how do you read and understand the performance information? It’s actually very simple.

Look at this Image, it shows GPU data, using HDRP and in the “forward” mode:

Now, to understand exactly what each metric means and what affects it:
GPU/Forward/HDRP:

1. “ForwardOpaque”: The cost of your opaque objects (the majority/all of your objects placed in the scene. This is decided by poly count, MSAA cost also goes here, and maybe drawcalls)
2. “RenderShadowMaps”: The cost of rendering all shadows in your scene, this is effected by the amount of objects that cast shadows, their polycount, shadow render distance, and the amount of shadow cascades you’re using. (doesn’t include Contact shadows if you have it enabled).
3. “Volumetric Lighting”: This is the cost of your volumetric fog, decided by the quality options chosen in the fog post process override, also affected by the number of lights with “volumetrics” enabled. The denoiser selected in the fog override has a cost as well.
4. “Volumetric Clouds”: Cost of using Volumetric clouds, effected by (num of primary steps) and (Num of light steps) selected in volumetric cloud post process override.
5. “Post Processing”: This is the cost for some of the post processing available in HDRP, like Bloom, Exposure, motion blur, etc.
6. “ForwardDepthPrepass”: This is the cost of doing a DepthPrepass in forward mode.
What is a Depth Prepass A depth pre-pass eliminates or significantly reduces geometry rendering overdraw. In other words, any following color pass can reuse this depth buffer to have one fragment shader invocation per pixel. This is because a pre-populated depth buffer contains the depths of opaque geometries closest to the camera. The subsequent passes will shade only the fragments passing the z test with matching depths and avoid expensive overdraws.
7. “Contact Shadows”: Cost of doing contact shadows, decided by quality options in it’s post process override.
8. “Ambient Occlusion”: Cost of doing SSAO, decided by it’s post process override quality options.
9. “ObjectsMotionVector”: Cost of object motion vectors, decided by the amount of meshes with object motion vector (like animated grass).
10. “ColorPyramid”: Not 100% sure, but I believe this is decided by the “color buffer format” and/or “Buffer Format” in your HDRP asset.
11. “BuildListList”: cost of building a light list in your scene, decided by the amount of active realtime lights in your scene and possibly their range.
12. “OpaqueAtmosphericScattering”: This cost comes from your fog override. (HDRP).
13. “CopyDepthBuffer”: copies depth buffer :smile:

Deferred mode GPU metrics are very similar with some changes:
ForwardOpaque is split into multiple metrics in deferred mode:
1. “Deferred Lighting”: Which handles lighting costs, this is affected by the amount of realtime lights you have, and most importantly their range. Range makes a big difference in deferred, you can have many lights with very little performance cost as long as their range is small. The bigger it is, the more expensive.
2. GBuffer: Cost of your rendered objects, affected by polygon count.

CPU data is similar, “GPU ms” is replaced with “Time ms”.
Now, let’s explain CPU metrics.
fcppxh

  • CPU is a bit more complex, as in there’s more metrics scattered around, but don’t worry – it’s very simple to understand.

— CPU module section —
We will dive into rendering CPU cost, in your game, your script costs might show up – we will only talk about rendering CPU costs.
Notice that the CPU rendering cost is overhead. You can optimize this by reducing batch/draw calls, improving the usage of SRP batcher (by using less unique shaders, and using the same shader variant for your materials), and by doing things like culling lights that are far away, and reducing your usage of realtime shadows. You can also use shadow caching.

First, there’s “editor loop” and “player loop”.
“Editor loop” is the CPU editor cost (overhead) that goes into your profile data, but it’s not all there – so don’t think you can deduct its cost to get the performance of a real build.

“Player loop” is the CPU cost of your scene, almost everything is inside this metric. In a build, this will perform better than in-editor.
When you expand player loop, you get this:

Basically, you can ignore everything except the first metric.
“RenderPipeLineManager.DoRenderLoop_Internal” is where the rendering CPU cost will go, we’ll dive deep into this one. As you can see, almost all CPU costs is in there.

When you expand it, you will get two important metrics, both have “render main camera” in them. They’re the most expensive. You can see it here:

Both have similar (or same) metrics, so we will talk about them as one.
Now, let’s explain all the metrics you will find in there. Notice that many metrics we’re talking about, could be inside of other metrics.
These results are with a deferred renderer, forward will be mostly similar with slight changes.

1. InI_ExecuteRenderGraph": This is where you will find most rendering metrics, inside this one.
2. “InI_RecordRenderGraph”: HDRP uses a rendergraph, this simplifies some graphic wizardy and can lead to reduced VRAM usage, there’s little to nothing you can do about this one to reduce it’s costs. Possibly enabling “Dynamic Render Pass Culling” in HDRP global settings can slightly improve it’s performance.
3. “PrepareLightsForGPU”: This does some graphics wizardy for realtime/mixed lights in HDRP. The more lights you have, the more this will cost. Since 2022.1 this got a big performance improvements and HDRP in the background uses jobs and burst to speed it up.
4. “Shadows.PrepareDrawShadows”: This is similar to PrepareLightsForGPU, but for shadows. The more realtime shadows you have, the more it will cost. This has been optimized in 2023.1, and will utilize burst/jobs for better CPU performance.

The following are metrics inside “ExecuteRenderGraph”:
5 “RenderShadowMaps”: This is CPU overhead for realtime shadows. The more shadow caster objects you have, and the more lights that cast shadows, the more expensive it will be.
6. “GBuffer”: overhead, same as GPU Gbuffer, has to do with number of objects, and polygon count. Make sure you’re utilizing SRP batcher well to reduce it’s cost.
7. “DeferredDepthPrepass”: Does a depth prepass for the deferred renderer.
8. “Deferred Lighting”: This cost comes from realtime lights, and their range.
9. “ObjectsMotionVector”: This cost comes from objects with motion vector mode set to Object mode. The more objects you have set to objects mode, the more expensive. Things that need object motion vectors are animated meshes, such as grass. (Motion vectors help TAA/Motion blur to give better results).
10. “ForwardTransparent”: Transparent objects can’t be rendered in deferred mode, so even if you’re using deferred mode, HDRP will render transparent objects in forward mode.
11. “PostProcessing”: Cost of post processing overhead.
11. “DBufferRender”: Decal cost (when using deferred).

Notice that not everything is in the profiler, some costs are not shown, this goes for GPU and CPU.
Also, your script costs can be found in the CPU, but sometimes you might have to enable “deep profile”, if you want to dig deep into scripts, but this comes with large CPU profiler overhead.

If you want to know how to improve runtime performance, check this thread: Other - Mega runtime Performance tips thread (unity & HDRP) - Guide to better runtime unity performance - Unity Forum

18 Likes

Added section explaining CPU module as well

2 Likes

Great Post

Awesome contribution.