This is HDRP focused, with a focus on HDRP specific optimizations. Like shadow caching.
Most tips on here apply to all render pipelines though.
I’ll start with some of the obvious.
- CPU: Drawcalls/batches, SRP batcher makes it so materials/setpass calls less expensive, but they’re still eating performance. Even in the old Render Pipeline, you get much better performance when you manually combine meshes than the automatic unity static or dynamic batching. It is recommended to use cell based mesh combining, so culling and LODS still work. (Although if you need a lot of materials, read down about SRP batcher, it’s really good)
tldr SRP batcher
(SRP) Batcher is a draw call optimization that significantly improves performance for applications that use an SRP. The SRP Batcher reduces the CPU time Unity requires to prepare and dispatch draw calls for materials that use the same shader
variant. Note that it’s not just the same shader, but it also needs to be the same shader variant
- Both: real-time shadows! I believe, since 2021.1, HDRP supports shadow caching.
Unfortuantly it is frustum view based*(edit: only directional light, read first comment), but you can still get massive gains if you render shadows for static objects as “On-Demand”. To avoid shadows popping up, don’t wait too long to update. Usually, every half a second is good. Although for lights other than the directional light you probably don’t need to update it as often since it’s not frustum based. Experiment and find the right time for your project. Not all lights need shadows, only enable them for ones you think need it.
Shadows rendering twice per 60 frames vs every frame. big CPU win
NOTE: for directional lights (the only light type where shadows are cached based on camera frustum view) you need to update quite a bit more often than usual, otherwise there will be popping issues.
- CPU: il2cpp! Everyone should use il2cpp for production if your project works with it. And if it doesn’t, try to make it work if it’s not too much work. Usually it works out of the box unless you did one of the few things il2cpp might not support. It’s a big performance gain, for little work from yourself. It even improves the C# rendering part, not just your scripts.
- Both: Find your bottleneck! Is it the CPU? The GPU? Make a development build with autoconnect profiler, is the CPU waiting for the GPU? If it is, you’re GPU bottlenecked.
- Both: Profiler! Use the profiler. It also had a GPU module, so you can use it to see what’s eating your CPU and GPU performance. It’s not a direct performance tips, but it is so easy to use that it’s a no-brainer. Seriously, use it.
It is strongly recommended you read the following thread to learn the profiler: Other - Guide to unity profiler: HDRP version (And how to read profiler data) - Unity Forum
- GPU: VOLUMETRICS. They’re enabled by default for each light, disable them and only enable them on the lights you need volumetrics for. Otherwise, it’s wasted GPU performance. Nothing is free, they’re expensive.
- GPU: Dynamic resolution! The good options are only available on 2021.2 and beyond (FSR/DLSS/TAAU). If someone is playing on anything higher than 1080p, especially 4k, it’s a no-brainer. Maybe even enable it by default in that case, with a sensible screen % of course.
- Both: Occlusion culling, although at some point, if your scene is big enough, it might be better to do manual occlusion culling.
TIP: Use the asset “Perfect Culling” in the unity store. It’s superior to unity’s umbra occlusion culling in every way: It has little to no overhead at all, so no matter how big your project, it’s worth it. It also does a much better job in the culling process.
- Both: Camera far/near clipping plane. Set it to a sensible value based on your testing.
- Both: Dynamic render pass culling. Only 2021.2 and up.
This skips rendering passes based on what’s visible for the camera. Find it at the bottom of HDRP global settings.
- Both: Be careful with what features you use, don’t just enable whatever with no consideration.
For example, realtime reflection probes are far too expensive and are only really viable for testing, demos.
And maybe your artistic blur at 0.001 intensity needs to take a backseat and get disabled
- Both: Forward/Deferred. Does your project use a lot of realtime lights?should use deferred then.
- Both: many features cost quite a bit more with little to no visual differences. Medium to high for many things is a large performance increase, but a very hard to notice visual upgrade. Medium is good
- CPU: SRP batcher: If your scene is heavy, and you can’t just reduce drawcalls as you need it for your scene. then SRP batcher (enabled by default) is really good! Good CPU gains. Big improvement to draw call performance. When creating your level, you can get away with using many materials, as you need, but with the same shader variant. And in general try to use the least number of shaders you can get away with. Check the SRP batcher documentation page, a lot of helpful info.
Edit 2:
- Both: This is less useful with HDRP since we have shadow caching, but still useful.
You can use shadow proxies, which are just objects/cubes/planes with the mesh set as Shadows Only. And use that to setup your shadows. Benefit is, you use little to no realtime shadows. Disable them on your game objects. The idea is to disable “shadow casting” on most of your objects and replace it with a few planes/cubes to create shadows for your world. Obviously, it won’t be as good or accurate.
- VRAM: In your camera frame settings (+global settings and active HDRP asset settings), check for features you aren’t using and disable them. Also helps with game size/VRAM.
- VRAM: In 2021.2 and before, all your HDRP assets assigned in ‘quality’ are using VRAM as if they’re all active. If you’re only using one of them, remove the others.
- Both/Terrain: Are you using a big terrain? If you are, check out the ‘Pixel error’ parameter in the terrain settings. When your terrain is textured, it’s pretty hard to notice the difference. You can most likely reduce polygons by 10-35% or more while retaining the same visuals. Experiment and choose the right value for your project.
- VRAM(experimental, possibly not a good idea): HDRP has texture streaming, in case you need it – for those who didn’t know
- CPU: Keep an eye on how much garbage you generate each frame from your own scripts. Try to keep it as low as possible. Also worth checking incremental garbage collection, I believe it’s in “player” settings – enabled by default in recent versions, usually it’s beneficial to keep it on.
HDRP rendering code should generate no garbage, if it does, report it as a bug.
- CPU: There are things you don’t really need to do every single frame, even if they’re in update – anything that has to be in update, and is expensive, consider the viability of running it every now and then, depending on your project ofc.
- Both: If you’re creating your own shaders, keep an eye on their complexity and performance costs, usually this isn’t an issue but it’s good to remember.
- Both: Check out global settings, it usually has a bunch of post processing – you can keep what you want, but disable the ones you don’t - in particular it has motion blur which is quite expensive relative to others.
- Both: In 2021.2 HDRP has volumetric clouds – keep the num primary steps and light steps as low as possible while retaining good visuals – primary steps in particular. A very low amount could lead to a lot of noise in the clouds, so find a good balance.
Edit (3) For foliage:
- Both: If you’re using a high density of grass/trees, make sure they’re instanced. HDRP supports terrain grass (which uses indirect instancing which is good for perf) starting from 2021.2.
- Both: One idea that might help depending on your needs is to combine a few grass meshes into one prefab, and use that prefab in the terrain. That way you can place less grass instances but still have a very high density.
- Both: I mentioned this above but if you have a big terrain, remember to play with the “pixel error” parameter. It can massively reduce terrain polygons with little to no noticeable changes. Find a good value based on your project.
- Both: Don’t forget tree LODs and billboards!
- Both: Split up your TERRAIN/SCENES! If you have a big world, you probably need to stream it in or out. Otherwise your terrain will have a big cost. The terrain object in unity does it’s own culling and it can get really expensive. You can split up your terrain into 5, 10, 20, depending on how big your world is. There’s assets on the store to do that, or you can do it manually. One basic way to do is it to split your world into scenes, rather than having one massive scene, you’ll have 2, 5, 10, etc (scenes) based on your needs. And load them in using unity’s (SceneManager.LoadSceneAsync) API. This will also help with your VRAM usage. Don’t forget to unload scenes you’re not using anymore. This doesn’t just apply to your terrain, depends on how you do it but you can include props, objects, other meshes, etc in your scene.
Edit 4:
- Both: Consider level design changes if all else fails, if you’re struggling with world streaming, or performance, you might have to change up your map in a way that allows Occlusion Culling (tip: Use Perfect Culling in the asset store, I’m not affiliated, it’s just that good compared to umbra) to do a better job, and in other areas for streaming you can create a buffer zone between Point A and B, that allows you do stream in/out your levels while the player is transitioning through this buffer zone. You It’s not ideal, but if you can’t see another way, this can save your game.
- Both: Remember to provide a graphic settings menu for PC, this will make it possible for your game to scale down, and In general it’s a nice thing to find in games.
- Both: Keep an eye on your VRAM usage. If a player has 8GB of VRAM, but your game requires 10GB, that extra 2 will be very expensive since it’ll have to move textures between main and video memory. Which is much slower than VRAM. Performance will degrade massively, most of the time to unplayable levels.
- VRAM: Build your game, close everything, including unity editor, browser, steam, videos, etc.
Look at how much VRAM your computer is using, then run your game, and look at how much VRAM is being used up, you should get a rough idea of how much your game is using by taking the current amount and deducting it by the previous idle amount.
IF your VRAM usage seems to be an issue, then get the memory profiler package and look at what exactly is consuming your VRAM. How much of it comes from your project, and how much from the render pipeline. From there, work on reducing it.
- VRAM: To lower VRAM usage, you can use texture streaming (HDRP, texture streaming is still experimental, possibly not a good idea to use yet), or lower texture res, split your world into multiple scenes, and stream your world, unloading scenes when appropriate to reduce VRAM usage, HDRP can sometimes also be an issue so you have to work around that and make it consume less VRAM by changing settings and following other tips on here.
- Both: Mesh LODs. They make a massive difference depending on your scene and detail.
Imagine you’re using 200 rock meshes, very detailed. After a certain distance most of the time they’re not even visible, or if they are, the detail is completely lost on them. They’re an extremely small part of the screen. This applies to all meshe, cliffs, buildings, props, etc.
LODs also make it possible for you to make your objects high poly up close without completely sacrificing performance. Since most objects in your world aren’t very close to the player.
You can also completely cull objects after a certain distance.
LODs are quite important, but at the same time don’t overdo it! Your small rock doesn’t need 5 LODs! If it’s small, you might be able to get away with just 2 LODs. 1 Is the base, second is completely culled.
Remember to judge your objects size and polycount when deciding on how many LODs to use. LOD transitions have a cost. Search around if you’re not sure how many LODs your mesh should have.
- Both: Polygons, you’d be surprised how sometimes it makes little difference in quality compared to the same mesh with lower polygons. Depending on your project, texture do a lot of the heavy lifting in making your objects look detailed.
Again, this depends on your type of project, but unless undesired for a specific reason, keep your polycount in mind and make a budget for your game, based on your own testing. (Remember players with weaker hardware than yours if you’re still targeting them).
edit 5:
- Both: Objects that cast shadows will be considered shadow casters, even if they receive 0 direct lighting and thus, they have no shadows, shadow calculations are still running for them!
This especially has a massive GPU cost. You can find the “cast shadows” parameter in the inspector for your object, you can select “off, on, shadows only” - if your object never receives direct lighting and you know that, set it to off. Also, if it’s a small object with LODs, consider setting “shadow casting” to off on the second or third LOD. You decide based on the object and its size.
To give you an idea on how important this is, consider the following pics:
With shadow casting “on”:
With shadow casting “off”:
Over 4ms on the GPU! On an RTX 2070 super. And they provide nothing at all, none of these barrels are creating any shadows.
The scene for the above profiling:
The barrels are clusters, each “mesh” is around 6-7 barrels.
With the above tests, these barrels were switched from shadow casting on and off.
There’s only one light in the scene (directional light), with shadows enabled.
The barrels in the picture are 525 objects (some are on top of each other).
- GPU: Volumetric fog in HDRP can get extremely expensive. If you only care about volumetric fog from the directional light, lower the quality. You can do that in the fog post process. Set it to “low”, or even switch to manual control and lower it even more.
It can get crazy real fast.
Also, reprojection denoising is very expensive for volumetric fog. Consider using gaussian.
If you care about volumetric fog for point/spotlights, then unfortunately you might have to struggle with some noise and flickering, unless you’re willing to give up half or more of your GPU budget. Also consider lowing volumetric fog distance if you’re using punctual light volumetric fog, to fade them off at a distance, otherwise volumetric fog will look very pixelated far away.
In my opinion, HDRP needs to improve volumetric fog performance, to get better visuals at better performance because currently you’ll have to struggle with massive punctual light volumetric fog flickering and noise, especially if Anisotropy is over 0, the higher it is, the more problematic noise/flickering will be.
edit 6:
Both: Texture Atlasing - although HDRP has SRP batching, it’s still faster to have less materials to begin with. They’re ideal for meshes that are scattered around at high densities. They also provide you with the ability to combine all meshes that use the same material.
Ideally, you’d combine them intelligently based on a cell system or manually. Close meshes combine into one and so on. Otherwise, if all of them are a single mesh, you’d lose frustum view culling & occlusion culling. (You can use combine mesh studios for cell-based mesh combining with LOD support *not affiliated)
Note: Texture atlasing might increase your VRAM usage, keep an eye on it
GPU: Shadow filtering quality - HDRP has different shadow filtering quality options, that control the how soft the shadows are. “high” is the most expensive, the issue with high quality shadow filtering is with smaller lights (point/spot). These lights with high filtering will have a large performance increase, much larger than the directional light. Worst of all, that increase can’t be cached, so the performance will be there no matter what as long as the shadows are on.
The increased performance cost from high quality shadow filtering will be added to “deferred lighting”, shown in unity’s profiler with the GPU module enabled.
If your project utilizes point/spotlight shadows, consider switching to medium.
HDRP developers should separate the option into two, one affecting the directional light & another for point/spotlights. The cost with the directional light high shadow filtering isn’t as crazy as point lights.
Both: Baked lighting - You can set lights to baked, that way there will be zero performance cost of said lights, their shadows too if you want. You will lose specular lighting though; only real-time lights are capable of that without custom shaders.
The cost you’ll pay is baking times and fighting unity’s lightmapper
Both: Managing GPU usage for PC and console games | Unity - Good post from unity with a lot of helpful information to improve performance, some already mentioned here, but it’s a good read.
Both: Configuring your Unity project for stronger performance | Unity - Another good read from unity focused on performance.
CPU: Make sure “graphics jobs” is enabled. You can find the option in project settings > player settings. This will multithread rendering code, leading to a big improvement as the CPU graphics overhead on the main thread will be massively reduced. This is enabled by default but check to be safe.
Both: Shadow cascades - In HDRP, just like all other render pipelines, you have shadow cascades. To understand what they are, and their benefits check this page: Click Here
tldr on what shadow cascades are
Shadow cascades are a technique used in real-time 3D graphics to improve the quality and accuracy of shadows in a scene. The basic idea is to divide the viewable area of the scene (the “frustum”) into multiple sections, or “cascades,” and render the shadows separately for each cascade. Each cascade covers a larger area and is rendered with less detail and lower resolution than the previous cascade, with the closest cascade having the highest detail and resolution.
This approach allows for a balance between performance and quality, as the GPU can spend more resources on the cascades that are closest to the camera, where the shadows are most visible, while still providing adequate shadow detail for the rest of the scene. This technique is commonly used in video game engines and other real-time graphics applications to improve the visual quality of shadows and reduce the performance impact of rendering them.
Now, one thing you must know is that shadow cascades aren’t free and will cost you precious GPU and CPU cycles. To understand how many cascades you need to use, you have to consider your directional light shadow resolution, scene/map size, and shadow render distance.
In the end, test it for yourself – set it to different amounts and see if you like the result.
Shadow cascades should be a part of your game’s graphical settings page, as it’s a good way to scale down for weaker hardware.
The less cascades you use, the faster shadow rendering will be on your GPU and CPU.
Check this performance example of a middle-sized environment with many objects:
(This is only showing GPU performance, but it also effects CPU performance)
4 cascades:
3 cascades:
2 cascades:
Both - SRP batcher & efficiency: There’s an issue that few people know about when using the SRP batcher, which is that it will separate draw calls if they have different shader keywords, forcing a new draw call.
With SRP batcher, as long as you use the same shader (even with different materials), it will improve your performance by reducing the cost of drawcalls, but different keywords per material can reduce its efficiency.
To found out more, you need to use the frame debugger, from that you can tell why some calls are separated and weren’t combined with the previous SRP batcher call, usually the cause is a completely different shader, or a difference in shader keywords (imagine two materials using the same shader, but one material has receive SSR enabled, the other has it disabled).
Also, your materials will keep keywords from previous shaders, even though they’re useless, so this will mess up with SRP batcher even more.
To remove them, follow this video:
y5o2xp
If you have any “invalid keywords”, click on the “-” to remove.
To see the difference this can make, same scene, same camera view:
25 steps to render Gbuffer, 171 overall for the frame.
After some keywords changes:
Down to 8(!) steps for the Gbuffer, and 135 total.
Also, you can see in the pictures “Batch cause, SRP: node use different shader keywords.”
or “different shader”
this gives you the reason why it had to do its own draw call and wasn’t a part of the previous SRP batch, and so on.
In this scene, SRP batcher can be improved even further with more keyword changes & shader changes.
misc info:
combined meshes: When you combine meshes, it can give improved CPU performance due to reduced draw calls if done to the right objects but might slightly increase your GPU performance. Remember to always use unity’s profiler (CPU & GPU modules) to know exact numbers and if it’s worth it for your particular scene/project.
General quality: Your game quality settings should have hardcoded settings for general features, such as SSAO, SSR, Shadow distance, etc. (You can also expose them in an advanced graphical settings page)
For example, you can customize how many samples SSAO does, full or quarter resolution, etc. - This goes for the majority of settings in HDRP.
With HDRP, you get access to a lot more options. This will help scale down performance as required, or push visuals to the extreme depending on the user’s hardware.
Profiler & frame debugger: Improving performance has 1 extremely important requirement: Knowing your bottleneck, knowing the performance cost of your scene.
It is absolutely essential that you learn to use the unity profiler, both the CPU and GPU modules, and to a lesser degree learning the frame debugger which helps with knowing exactly how many steps are taken to render a single frame of your project, frame debugger can help you debug the SRP batcher, and thus improve its efficiency, leading to improved performance. (see "SRP batcher & efficiency" above).
How to read profiler data, Meaning of GPU metrics (HDRP)****:
How to read Unity Profiler metrics and understand them (Click me)
1. “ForwardOpaque”: The cost of your opaque objects (the majority/all of your objects placed in the scene. This is decided by poly count, MSAA cost also goes here, and maybe drawcalls)
2. “RenderShadowMaps”: The cost of rendering all shadows in your scene, this is effected by the amount of objects that cast shadows, their polycount, shadow render distance, and the amount of shadow cascades you’re using. (doesn’t include Contact shadows if you have it enabled).
3. “Volumetric Lighting”: This is the cost of your volumetric fog, decided by the quality options chosen in the fog post process override, also affected by the number of lights with “volumetrics” enabled. The denoiser selected in the fog override has a cost as well.
4. “Volumetric Clouds”: Cost of using Volumetric clouds, effected by (num of primary steps) and (Num of light steps) selected in volumetric cloud post process override.
5. “Post Processing”: This is the cost for some of the post processing available in HDRP, like Bloom, Exposure, motion blur, etc.
6. “ForwardDepthPrepass”: This is the cost of doing a DepthPrepass in forward mode.
What is a Depth Prepass A depth pre-pass eliminates or significantly reduces geometry rendering overdraw. In other words, any following color pass can reuse this depth buffer to have one fragment shader invocation per pixel. This is because a pre-populated depth buffer contains the depths of opaque geometries closest to the camera. The subsequent passes will shade only the fragments passing the z test with matching depths and avoid expensive overdraws.
7. “Contact Shadows”: Cost of doing contact shadows, decided by quality options in it’s post process override.
8. “Ambient Occlusion”: Cost of doing SSAO, decided by it’s post process override quality options.
9. “ObjectsMotionVector”: Cost of object motion vectors, decided by the amount of meshes with object motion vector (like animated grass).
10. “ColorPyramid”: Not 100% sure, but I believe this is decided by the “color buffer format” and/or “Buffer Format” in your HDRP asset.
11. “BuildListList”: cost of building a light list in your scene, decided by the amount of active realtime lights in your scene and possibly their range.
12. “OpaqueAtmosphericScattering”: This cost comes from your fog override. (HDRP).
13. “CopyDepthBuffer”: copies depth buffer
Deferred GPU metrics are very similar with some changes:
ForwardOpaque is split into multiple metrics in deferred mode:
1. “Deferred Lighting”: Which handles lighting costs, this is affected by the amount of realtime lights you have, and most importantly their range. Range makes a big difference in deferred, you can have many lights with very little performance cost as long as their range is small. The bigger it is, the more expensive.
2. GBuffer: Cost of your rendered objects, affected by polygon count.
To learn more on how to use the profiler, check this thread: Other - Guide to unity profiler: HDRP version (And how to read profiler data) - Unity Forum