Mega runtime Performance tips thread (unity & HDRP) - Guide to better runtime unity performance

The gains can be massive when you’re using P/Invoke, which is used by a lot of the Unity codebase. I’ve done texture operations in C code, and the speedup was just immense. Talking 1000x improvement.

Another edit, added level design tip & some VRAM considerations
If anyone has other tips or just info to consider when it comes to performance, especially when it comes to HDRP, do give :slight_smile:

1 Like

I generally keep my shadow distance low(for directional light)… If u are using real-world scale I keep shadow distance to 80-200m and keep the resolution to 512 for 80 and 1024 for greater distances … For distant shadows I use contact shadows only with distance set to mor then 10km… and to be honest there is no difference at all + shadows of very distant object is also visible… Thanks to the contact shadows!!..i have tested these settings only in a forest scene… Don’t know how it plays in yours… And for grass, disable shadows for them and again just use contact shadows on them… Use tris count of your large trees between 10k and 4k (keep as low as possible in the given range especially for open worlds) if covering old gen like ps4 (if next gen u can use in the range of about 10k-25k)… Limit the use of too many textures and materials and use atlasses and virtual texturing … Use simplified proxy shadow casters for your vegetation (most openworld games on PS4 like horizon zero dawn and ghost of tsushima did it)… Reuse your vegetation assets alot and just use two- three variations of a tree type(look at rdr 2 lots of reused assets across the world)…use the new instanced terrain tools for performance

3 Likes

How do you change the resolution in respect to shadow distance? Never saw such a setting in HDRP.

1 Like

In the pipeline settings under Lighting > Volumetrics, the max local fog on screen is set to 64 by default. At high resolution this takes 710MB of VRAM. You can probably set this to a more sane value like 3 or something and save hundreds of megabytes.

1 Like

It does this by default. They’re called cascades.

I am using cascades, but I understood that all cascades use same resolution for shadowmap, or is this configurable? If yes, where?

Think twice when using any screen space effects in unity especially SSGI and SSR and can make your game crawl !!.. These things are extremely heavy and quality wise not that great even at ultra… SSGI produces lots of noise and some artifacts in low light conditions like cloudy day or dim lit rooms even at full resolution and again extremely heavy, SSR isn’t usable at all, when set to medium or low because of some horrible lining artifacts it produces, it gives expected results at ultra and high but it’s extremely heavy then and still not that great when compared to legacy pipeline SSR!! I recommend using asset store alternatives to them for better quality and performance!! These things really need to be improved because these don’t seem to be production ready at all!!

2 Likes

I figured out how to improve SSGI in cloudy conditions by baking a single Enlighten probe with a “ground” cube below it. This means that it doesn’t sample the skybox from below on ray miss.

SSGI is heavy, yes, but it also adds a lot to my game. SSR isn’t that heavy in my case? But I have noticed those artifacts.

Thanks!! Can u provide a video to show that SSGI isn’t producing any noise in your environment… I had tried to set the SSGI fallback to only reflection probes but didn’t work (this is not similar to your technique, but won’t it do the same of not sampling the skybox from below??)

It’s not quite free of artifacts of course, but if you go look for them you see artifacts in most games.

3 Likes

Hi there!
Cool thread! Just wanted to add some notes (although mostly applicable for URP and mobile development)
Just wanted to add that I’ve had quite a big success with using Amplify Impostors for mobile games.
Traded around 400k tris for a couple of MB texture memory and extra (instanced SRP batched) drawcalls.
Highly recommended!

Also for some mobile devices, it can be good to decrease the render scale below 1 as some have very high DPI.

Moreover, keep an eye on the development of adaptive performance.

5 Likes

Made an addition about “shadow casting” and its effect on performance + talked about volumetric fog cost.

  • Both: Objects that cast shadows will be considered shadow casters, even if they receive 0 direct lighting and thus, they have no shadows, shadow calculations are still running for them!
    This especially has a massive GPU cost. You can find the “cast shadows” parameter in the inspector for your object, you can select “off, on, shadows only” - if your object never receives direct lighting and you know that, set it to off. Also, if it’s a small object with LODs, consider setting “shadow casting” to off on the second or third LOD. You decide based on the object and its size.

To give you an idea on how important this is, consider the following pics:

With shadow casting “on”:

With shadow casting “off”:

Over 4ms on the GPU! On an RTX 2070 super.

The scene for the above profiling:

The barrels are clusters, each “mesh” is around 6-7 barrels.
With the above tests, these barrels were switched from shadow casting on and off.

The barrels in the picture are 525 objects (some are on top of each other).

2 Likes

Worth mentioning: Modern GPUs can render tens of millions of polygons pretty fast!
The above barrels are all LOD0 with all other LODs deleted.
With shadow casting off, they’re about 15M tri and 29M vert!

They’re all rendered in 3.5ms with forward mode, and about the same with deferred.
GPUs can handle even more, but the biggest performance cost here for these props might not be the crazy amount of polygons, but small triangles. They’re all LOD0 and fairly small, so far away barrels end up with very small triangles that increase GPU cost quite a bit.

So, you might not need 2-4 LODs, you can get away with a single LOD1 and then cull after that. The purpose of LOD1 is to reduce complexity of objects at a distance. HDRP actually has a debug mode for this you can use called “vertex density” and “quad overdraw”. You can also add an imposter for final LOD if you don’t want to cull it.
In the end it depends on object size, your map, etc.


(This is vertex density debug mode in HDRP)
Notice how all the barrels from a distance are completely red, while the big cliffs and nearby barrels are fine.
The cliffs are huge and not polygon dense for their size (originally a small cliff scaled up), but the barrels are small and very high detail.
You want to avoid that red as much as you can with LODs/imposters/culling.


(Quad Overdraw)
this displays small/thin triangles

edit: This doesn’t mean LODs aren’t important, or you should only have one. The addition of LODs will make a massive difference when it comes to performance. Don’t misunderstand this post :smile: (don’t be afraid to have 3-4 LODs, just don’t go crazy, and consider the object you’re making LODs for and its size in the world)

8 Likes

Added 3 new tips: (Texture atlasing, HDRP shadow filtering quality, Baked lighting)


Both: Texture Atlasing - although HDRP has SRP batching, it’s still faster to have less materials to begin with. They’re ideal for meshes that are scattered around at high densities. They also provide you with the ability to combine all meshes that use the same material.
Ideally, you’d combine them intelligently based on a cell system or manually. Close meshes combine into one and so on. Otherwise, if all of them are a single mesh, you’d lose frustum view culling & occlusion culling. (You can use combine mesh studios for cell-based mesh combining with LOD support *not affiliated)
Note: Texture atlasing might increase your VRAM usage, keep an eye on it

GPU: Shadow filtering quality - HDRP has different shadow filtering quality options, that control the how soft the shadows are. “high” is the most expensive, the issue with high quality shadow filtering is with smaller lights (point/spot). These lights with high filtering will have a large performance increase, much larger than the directional light. Worst of all, that increase can’t be cached, so the performance will be there no matter what as long as the shadows are on.
The increased performance cost from high quality shadow filtering will be added to “deferred lighting”, shown in unity’s profiler with the GPU module enabled.

If your project utilizes point/spotlight shadows, consider switching to medium.
HDRP developers should separate the option into two, one affecting the directional light & another for point/spotlights. The cost with the directional light high shadow filtering isn’t as crazy as point lights.

Both: Baked lighting - You can set lights to baked, that way there will be zero performance cost of said lights, their shadows too if you want. You will lose specular lighting though; only real-time & mixed lights are capable of that without custom shaders.
The cost you’ll pay is baking times and fighting unity’s lightmapper (difficulty: impossible)

4 Likes

You can use Bakery. At the cost of 3 extra textures (regular png, not hdr) you will get speculars (not perfect though). It’s also easier to bake (gpu based), works with prefabs and so on… but has its own quirks and problems. Still, it’s worth a try.

btw awesome thread! Thanks for sharing.

1 Like

edit:
added Managing GPU usage for PC and console games | Unity
& Performance optimization for high-end graphics | Unity
also added a bit about graphics jobs
and some info about combining meshes.

3 Likes

edit:
added shadow cascades and performance examples between different cascades
& Added more info

A poem about game performance: don’t ask why…

Performance, performance, ever so dear,
GPU and CPU, always near.
Frames per second, smooth and fast,
A game’s true measure, to make it last.

Latency low, inputs precise,
A player’s experience, truly nice.
Optimization and tuning, a never-ending quest,
To make the game, truly the best.

Memory and bandwidth, always in mind,
To avoid stutters, of any kind.
Performance, the key to a great game,
A true obsession, it will always remain.

So let us strive, to make it shine,
Performance, the heart of the design.
For a game that runs well, is truly divine,
And leaves players, forever entwined.

  • ChatGPT
5 Likes

added (2):
Both: There’s an issue that few people notice when using the SRP batcher, which is that it will separate draw calls if they have different shader keywords, forcing a new draw call.

With SRP batcher, as long as you use the same shader (even with different materials), it will improve your performance, but different keywords per material can reduce its efficiency.

To found out more, you need to use the frame debugger, from that you can tell why some calls are separated and weren’t combined with the previous SRP batcher call, usually the cause is a completely different shader, or a difference in shader keywords (imagine two materials use the same shader, but one material has receive SSR enabled, the other has it disabled).

Also, your materials will keep keywords from previous shaders, even though they’re useless, so this will mess up with SRP batcher even more.

To remove them, follow this video:

y5o2xp
If you have any “invalid keywords”, click on the “-” to remove.

To see the difference this can make, same scene, same camera view:


25 steps to render Gbuffer, 171 overall for the frame.

After some keywords changes:

Down to 8 steps for the Gbuffer, and 135 total.

Also, you can see in the pictures “Batch cause, SRP: node use different shader keywords.” or “different shader”
this gives you the reason why it had to do its own draw call and wasn’t a part of the previous SRP batch, and so on.

In this scene, SRP batcher can be improved even further with more keyword changes & shader changes.

& added to misc info:
Profiler & frame debugger: Improving performance has 1 extremely important requirement: Knowing your bottleneck, knowing the performance cost of your scene.
It is absolutely essential that you learn to use the unity profiler, both the CPU and GPU modules, and to a lesser degree learning the frame debugger which helps with knowing exactly how many steps are taken to render a single frame of your project, frame debugger can help you debug the SRP batcher, and thus improve its efficiency, leading to improved performance. (see "SRP batcher & efficiency" above).

1 Like