Dynamic shader variant loading

Greetings from the Shaders team!

Unity 2023.1.0a11 brings dynamic shader variant loading. This feature allows you to manage the runtime memory usage of shaders.

The bulk of shader memory consumption in the player runtime comes from two areas: variant parameters and variants themselves. When we do a build, we compile individual variants for each shader, pack their parameters and the code together into chunks and compress each chunk individually. When a shader gets loaded, we decompress all chunks, load all variant parameters and prepare all variants for compilation by the GPU driver when rendering requests them.

Dynamic shader variant loading enables dynamic decompression of the chunks mentioned above and exposes control over two settings: the size of chunks during the build in megabytes and the maximum number of chunks that are kept decompressed simultaneously for each shader at runtime. Both settings can be configured globally and overridden for each platform. The default values are 16 megabytes per chunk, unlimited chunks. We treat 0 chunks as “no limit”.
Additionally, you can use Shader.maximumChunksOverride to override the chunk limit at player runtime for any shaders loaded after changing this value. The default value is -1, which means “do not override”. Setting this property to a positive values will set a fixed limit on the number of loaded chunks, 0 is treated as “no limit”, similar to the build time setting.
When all variants of a shader fall within the chunk limits, we preload all variants, as in the default case.

Please note that these limits only affect variants themselves, and have no effect on variant parameters.

We measured the memory savings and performance implications on two projects, Boat Attack and an artificial scene with one shader that has 30 000 variants. Measurements were taken with two sets of settings: default (up to 16 MiB chunk size, unlimited chunks) and with dynamic variant loading enabled (up to 1 MiB chunk size, up to 1 chunk loaded per shader). All measurements were performed on a MacBook Pro M1.
Memory usage in the artificial scene was reduced from 122.9 MiB (default) to 47 MiB (dynamic loading), or 61.8% reduction; in Boat attack - from 315 MiB (default) to 66.8 MiB (dynamic loading), or 78.8% reduction.
Initial loading is faster with dynamic variant loading as well - artificial scene loaded the heavy shader in 41.58 ms (dynamic loading) instead of 64.68 ms (default), or 35.7% faster; Boat attack loaded the shaders in 46.89 ms (dynamic loading) instead of 114.4 ms (default), or 59% faster.
Of course, this is not entirely free. Loading individual variants when they are required takes roughly 10% more time: 0.25 ms per variant with dynamic loading and 0.23 ms per variant with default settings.

We plan to backport this to 2022 and 2021.3 LTS.

Stay tuned for more!

15 Likes

@aleksandrk

Thanks sounds good. Waiting for the backport to 2021 LTS
Updated the project from 2019 lts to 2021 1.5 Weeks ago increasing the build time by xx% because of shader variants going into the millions.

Any idea when we can get a fix for 2021 LTS?

This looks to be a change to runtime shader loading. So it won’t affect compile times at all.

This is correct.

By the way, Dynamic variant loading will also be available in 2022.2.0b10, 2022.1.21f1 and 2021.3.12f1.

3 Likes

Do we have to do anything to enable it or it just works straight away?

@fendercodes you need to change the chunk settings to enable it. By default it’s 16 MiB chunks, 0 chunks (unlimited). If you change the number of chunks to a positive value, it will be enabled.
They are available in Player settings.

1 Like

Just to make sure, it will be available in 2022.2.0+ as well? Aka all versions that will come from now on? Dynamic shaders will work on webgl as well? I am already waiting for Monday to try it out :slight_smile:

Yes, it should work on all platforms in all 2022.2 versions starting from beta 10.
And on 2021.3 patch 12 and later.

Greetings. So I am working with Unity 2022.2.1 now ( tho same goes with all previous versions) and addressables package. So here I have a problems, this is workflow that I expect to work:

  1. Go to Player Settings → Graphics → Save Shader Variant Collection.
  2. Add that SVC file to addressables.
  3. Build addressables with addressables scenes.
  4. In Player Settings → others I set shaders default chunk size to 16 and default chunk count to 2
    So expected behavior is, then when scene starts and loads, shaders loaded to memory will be only the needed ones. However what I get is shaders using over 300 mb.

Memory profiler report

With custom shader striping code I am able to reduce it to ~100 mb.

So my question is it I am doing something incorrect? is it problem of addressables? Or there is something I am missing to use dynamic shader loading?

Extra info:
Platform: WebGL
43 shaders and total 91 variants used.
In editor URP Lit uses 0.8MB

1 Like

Hi!
You need to set the chunk parameters before building the addressables - the chunk size setting specifically affects the build.
Try setting the chunk size to 1MB - does it take less memory after doing that?

First of all, thank you for your time,

I know that this is more problem with Addressables than dynamic shaders, but I am very thankful for your help.

After reducing chink size to 1mb the memory size of URP/Lit got reduced:

  • Universal/Lit shader is in Packed separately addressables group together with Shader variants collection: to 67.6MB.

-Universal/Lit shader is NOT ADDED manually to addressables group, but Shader variants collection is added manually into addressables group: to 273.8 MB.

From these results I could guess that Addressables does load all possible shader variants by specific keywords. When everything is in same Addressables group it does load less. But regardless of that the size of loaded shaders in memory is still way too big, from what I can only guess that dynamic shaders loading does not work fully with it.

Example in Editor the URP/lit reports at about 0.4 MB of memory (as editor does not use addressables directly.).

On iOS, this literally slashed my app’s total memory useage in half. Fantasic work, crew.

2 Likes

Firstly, I’d like to say that I’ve very limited information about how shaders are processed, so my question may make no sense (understandably): Could you explain why decompressed chunks stay in memory? If we’re done with them once they are uploaded to the GPU, I couldn’t understand why they remain in the memory. If we need them while the shader is in-use, then if we set the maximum number of chunks to 1 and multiple shader variants (chunks) are used by the scene, won’t these chunks fight for the single available chunk slot, causing an infinite loop?

A single chunk usually contains multiple variants of the same shader. Variants are loaded on demand, so as soon as you, for example, move or turn the camera, a new object can come into view and it may need a variant that wasn’t loaded yet.

So no, we don’t need them while the shader is in use, we need them only when loading an individual variant.

1 Like

OK so it’s like cache prefetching in a way, got it! Thanks for the explanation :slight_smile: 0.02 ms difference per variant with default settings sounds like a very insignificant con so the pros heavily outweigh the cons IMO, thanks for the new feature!

1 Like

This is great, we are fighting with shader variants and this helped to reduce our memory usage on scene load by 50%

Would you be able to give more detail on the maximum chunks value?
the maximum number of chunks that are kept decompressed simultaneously for each shader at runtime.

Kept decompressed where? In memory I assume?
What happens to chunks which are no longer needed or the number of chunks needed is > maximumChunks?
If unneed chunks stay in memory are they compressed again to save memory?

Finally, is there any way we can gauge a good chunk size and max number of chunks?
It sounds like once stuff is loaded into memory its not unloaded from your other comments. So if we know we have lots of variants, because we aren’t stripping them yet, setting the chunk size to 1MB and 1 chunk max would theoretically load in the fewest variants we need and avoid the bloat from the unused variants?

That’s correct.

We keep the compressed data in memory if we don’t load everything at once. Shaders have a very high compression ratio, so it doesn’t cost much.
The least recently used chunk is simply unloaded. Since the compressed data is readily available, it will get decompressed again if needed.

You need to decide what “good” looks like first :slight_smile:
If your goal is to reduce memory usage above all else, 1 chunk with 1MB is perfect, as it will, indeed, keep the least potentially unnecessary data around. This may come with increased load time or a bit longer frame times when a variant is needed that is in a chunk that is not currently decompressed.
From our tests not loading all those variant up front usually saves more loading time than what’s spent later on decompression, but this is definitely HW-dependent, so you should profile on your lowest target devices to make informed decisions.

3 Likes

Thank you for the information, that really helps!

Hi @aleksandrk

Will the “dynamic shader variant loading” help improve build times? (Not at runtime) … sorry I’m new to all this.

Can we trouble someone on the Unity team to make tutorials on how to reduce the shader variants compilation time? Or at least best practices?

I’ve been trying to make a build the whole day … my pc is still building… it’s probably at 9 hours now. :sweat_smile:

I’m clearly not doing something right…

Unity 2021.1.28f1
8760337--1187554--Screenshot 2023-01-26 234043.jpg

Thanks for the help