Compute shader multi_compile support

Hey!

2020.1.0a9 is out and with it the support for multi_compile directive in compute shaders has arrived.

Creating compute shader variants has until now been tedious work of hand crafting different kernels for different variations (by combining #pragma kernel and preprocessor defines) and then manually mapping the correct kernels to correct variants at runtime.

This new feature brings the keyword based variant management, used in regular shaders, to the compute shaders too.

New shader syntax
Compute shader code now has support for multi_compile pragmas similar to regular shaders. The pragma affects all the kernels in a .compute file.

#pragma multi_compile_local __ KEYWORD_X KEYWORD_Z```

**New & affected API**
The current global keyword API now affects compute shaders too:

- Shader.EnableKeyword/DisableKeyword
- CommandBuffer.EnableKeyword/DisableKeyword

Added new keyword API to ComputeShader class, matching what we have in Material:

- ComputeShader.EnableKeyword/DisableKeyword
- ComputeShader.IsKeywordEnabled
- ComputeShader.shaderKeywords

**Notes**

- No “select best matching variant” logic is built in this system. This means that the user is responsible for selecting a valid keyword combination. We do mask in only the keywords that are actually declared in the shader, but for keywords that don’t have the empty “__” option, it is user’s responsibility to pick a valid option. Otherwise the dispatch fails with error.

- Atm all the variants are compiled at import time. So be careful with adding too many variants. On-demand/background compilation and scriptable stripping support for compute shaders are planned items but not there yet.

- This is the very first release of the feature so some hiccups are definitely possible. Any early bird usage providing feedback and/or bug reports would be very much appreciated.

Thanks!
13 Likes

This news makes my day! Gonna try it out!

2 Likes

Nice !

Hey @Juho_Oravainen !

I have quickly tested for Aura.
The first observation is that, since I need to keep my asset working for earlier versions, I’d need to branch the compilation from 2020 and up.

However, I don’t know if I am doing it wrong or if it’s a matter of order dependency but doing this doesn’t work and still compiles two kernels :

#pragma kernel ComputeVisibleCells
#if UNITY_VERSION < 202010
#pragma kernel ComputeVisibleCells OCCLUSION
#else
#pragma multi_compile_local __ OCCLUSION
#endif

Cheers!

Yeah, utilizing this feature on systems that need to work also on older versions might need some trickery atm.

Unfortunately right now the pragmas are not affected by preprocessor defines. E.g. pragmas are parsed before the preprocessor. There might be some change coming in this regard but I can’t promise it’ll make it to 20.1. So you might want to wait for a few weeks before starting any bigger work on this sort of use case.

However, an alternative approach to this problem could be having a setup script for your system that checks the editor version and copies/generates the correct kernel pragmas and possibly also the variant selection utility scripts. This way after the setup step the project would have only the versions that work on that Unity version.

In any case your use case is a bit difficult one. As long as you need to support older versions you will need to keep the old manual kernel variant declarations and runtime selection scripts around one way or another…

1 Like

Can you add support to ComputeShader.FindKernel for finding variants with keywords? Eg I have an Ocean Simulation that operates on pow2 grid sizes, so I imagine the new code will look like this:

#prama kernel _Ocean_Simulation
#pragma multi_compile_local _GRID_32 _GRID_64 _GRID_128

So in my C# script, it would be great if I could do something like the following:

public int gridSize;
public ComputeShader oceanShader;

private int csIndex;

void Awake
{
csIndex = OceanShader.FindKernel("_Ocean_Simulation", $"_GRID_{gridSize}");
}

void Update
{
oceanShader.Dispatch(csIndex, 1, 1, 1);
}

With the new variant system you don’t have separate kernel index for each variant. You’ll just find the base kernel and then enable the keywords you need. E.g.:

    public int gridSize;
    public ComputeShader oceanShader;
    
    private int csIndex;
    
    void Awake
    {
        csIndex = OceanShader.FindKernel("_Ocean_Simulation");
        OceanShader.EnableKeyword($"_GRID_{gridSize}");
    }
    
    void Update
    {
        oceanShader.Dispatch(csIndex, 1, 1, 1);
    }
3 Likes

Hi! Thanks for the reply.
I understand that this situation might be tricky.
The main advantage for me would be to scratch for some performances since I switched from 1000s of variants to runtime evaluations of uniform bools/ints.
Since all the variants are still evaluated at import (in my case more than an hour of import, the reason why I moved to ifs) and retro-compatibility would require black magic AssetPostprocessor events, for everything I’ve done so far, I’ll wait for a way to preprocess the kernels’ compilations in regards to the Unity version and for on-demand compilation or stripping.
However, I’ll switch to that new system for the new stuff I’ll do.

Anyway, I’d like to thank you for working on this. When I asked a few months ago, you told me that it wasn’t on track but you’d try to put it on the table. Thanks for the followup!

3 Likes

This is awesome! Thanks a lot.

Does this mean you’re also planning to add the shader_feature macro?

Nope. shader_feature on the regular shaders basically checks if the variant is referenced by any Material in the project. As long as there is no material concept for compute shaders we cannot really have shader_feature for them.

On other news, with latest alphas we’ve moved the compute shader variant compilation from import time to background jobs with synchronous on-demand fallback when a variant is immediately required by the editor. This way one does not need to wait ages for the compute shader import to finish, making complex compute shader usage much nicer. (Note: this refactoring caused a nasty compute shader regression when targeting multiple graphics APIs. The regression is fixed in a19).

On the front of handling “old style” variants and multi_compile in the same system: We have introduced a new shader preprocessor ( https://forum.unity.com/threads/new-shader-preprocessor.790328 ) which allows having different pragmas based on UNITY_VERSION define. With this it is possible to create shaders that utilize manual kernel macros on older unity versions and multi_compile on newer versions. The new preprocessor is still opt-in at this point but it will eventually become the default option. Hopefully sooner rather than later, but no promises, as usual :slight_smile:

1 Like

True. I think my use case (of defining shader variants for debug visualizations) would be covered by a UNITY_EDITOR macro for shader code. I think this doesn’t exist currently. Do you think this would work? Or is there a better solution?

Unfortunately we don’t have UNITY_EDITOR macro for shaders atm. Having Editor-only variants for debug visualization is a perfect use case for scriptable stripping but we don’t support it for compute shaders… Stripping is a planned feature but it’s not there yet (and can’t give any estimates/promises really).

So in the mean time you’ll need to figure out some alternative way to achieve this. Here’s couple of options I could come up with quickly:

  • Refactor the shader so that most of the body is in include files. Then have separate .compute file for player version and the debug viz version, just including the generic stuff and building on top of that. Switch the shader objects with Editor script to enable/disable debug viz. Exclude the Editor-only shaders from the player build.
  • Enable/disable the debug mode with a hardcoded #define instead of keyword (have all the debug code behind #ifdef MY_DEBUG_VIZ). To be able to easily switch the define using a script you could put it in a separate include file that is re-generated with Editor script when switching the mode. Then with a build script make sure that the include always has the debug viz undefined to have the release version in the build.

Thank you very much for your suggestions. Using separate .compute shader files may be a suitable workaround.

I think compute shaders are a very powerful tool, so any improvements and features in this area are very welcome!

One other small annoyance that I experienced while working with compute shaders is that recompiling them at runtime loses all texture references. I assume this is also related to the fact that there is no Material correspondent for a compute shader. I was able to find a workaround by writing an AssetPostprocessor that resets the references after reimporting a shader. This is not very elegant, because you have to filter all asset imports, even though you’re only interested in a couple of shaders. And it doesn’t detect when #include dependencies change. Maybe with a ComputeShader.onCompile callback we could implement something similar to Material for compute shaders that manage references and properties.

Hello! Thanks for following this up.
Are you talking about the UNITY_OLD_PREPROCESSOR define? I am confused. Will it be included in updates of old Unity versions?

UNITY_VERSION macro can also be used to select #pragma parameters, as it only depends on the Unity version.

Yes, UNITY_OLD_PREPROCESSOR will be included in older versions as well. I’ll post on the preprocessor thread when it’s there.

So satisfying!

2 Likes

This seems like a low hanging fruit for optimizing compile time and compiled compute shader size. I have a compute shader with many kernels and many keywords, and only one of the kernels is actually using the keywords. The compiler could check what keywords are present in each kernel (including in the methods it uses), and then only compile the variants needed.

Meanwhile, it there workaround for this?

This is way more involved than it sounds.
The keywords are evaluated during the preprocessing stage, before HLSL is even parsed.

Yes, you can use separate .compute files.

Thanks for the quick response @aleksandrk . I see, so the variants are spun out just from looking at the pragmas. Having separate compute shaders is a viable work-around … it just feels redundant because most buffers and constants happen to be shared in this case. I guess that could be cleaned up using includes.

It would be convenient if either #pragma kernel or #pragma multi_compile_local could somehow take extra arguments for explicit inclusion or exclusion. Not sure what that would look like in terms of syntax through. Perhaps worse :eyes:

I’d prefer to introduce a bit of structure to compute shaders. This way it would be simpler to declare what goes where.