Poor Quality of Generated Shaders

You can see an example in the attached image. There are lots of things in the generated shader like unnecessary bit shifts as well as calls to “bitFieldInsert”. Why does this even happen in the first place? According to Xcode profiler, just removing these (they also contribute to the “integer ALU” section) would improve performance by ~20% assuming only half of the integer ALU goes away. That’s a pretty good savings for the exact same functionality. Some of these shifts appear to be just divides that get converted to shifts because the compiler detects they are a POT, but some seem to be unnecessary.

Also generated shaders seem to be using fma instead of matrix multiply functions, which I am not sure, but I would guess would be slower (hardware matrix multiply maybe, not the best evidence, I know).

So why is the generated shader code so poor and how can it be fixed? Any unity shader/rendering team experts here?

Hi!
Do you have an example of unnecessary bit shifts and calls to BFI?

Thanks, well, I am confused because it all seems unnecessary. If I write the shader without BFI, why is it needed when it is generated?

Well, that’s a result of the compiler optimizing things then :slight_smile:

1 Like

Ah yes I see now. It’s using BFI to optimize some integer multiplies and divides. Thanks :slight_smile:

1 Like

@aleksandrk got another one for you :smile: Any help would be very much appreciated.

generated MSL :

float3 u_xlat9;
half3 u_xlat16_9;
u_xlat16_9.xyz = _NormalTextures.sample(sampler_NormalTextures, u_xlat1.xy, round(u_xlat1.z), gradient2d(float4(u_xlat2.xyzx).xy, float4(u_xlat9.xyzx).xy)).xyz;
u_xlat9.xyz = fma(float3(u_xlat16_9.xyz), float3(2.0, 2.0, 2.0), float3(-1.0, -1.0, -1.0));

and original HLSL :

#define SAMPLE_TEX2DARRAY_GRAD_CUSTOM(tex,coord,dx,dy) UNITY_SAMPLE_TEX2DARRAY_GRAD(tex,coord,dx,dy)

inline half3 UnpackNormalXYZ(half3 packednormal)
        {
            half3 normal;
            normal.xyz = packednormal.xyz * 2.0h - 1.0h;
            return normal;
        }

half3 normal0 = SAMPLE_TEX2DARRAY_GRAD_CUSTOM(_NormalTextures, uvData.xyz, dx, dy);

half3 normal = UnpackNormalXYZ(normal0.xyz);

Notice the last line of the Metal shading code, how it is using floats, but this is incorrect. It should be using halfs. Of course, this is really important on mobile since mobile actually has FP16 math capabilities, so using FP32 is overkill, and it seems the shader generator is forcing a lot of halfs to floats throughout a lot of the shaders. This is just one example.

EDIT :

It seems what is happening is floats are polluting values used by them down the line, even if cast to halfs.

I have a function here :

half3 calculateViewDir(float3 worldPosition)
{
    return half3(normalize(_WorldSpaceCameraPos.xyz - worldPosition.xyz));
}

and I can tell just looking at the Metal shader code that every value that interacts with this return value hereafter is promoted to a float. But this shouldn’t happen, as I explicitly convert it to half3 (and the return type is half3 too).

rewrite normal.xyz = packednormal.xyz * 2.0h - 1.0h; as normal.xyz = packednormal.xyz * half(2.0) - half(1.0); and all will be fine :slight_smile:

This is unlikely. I think the problem is elsewhere :slight_smile:

Yes I came to the same realization earlier today. Thanks!! Why is that though? It’s confusing to even succeed compiling half literals if they aren’t treated as such (but it’s weird because it seems to sometimes work??). Would be nice to see that in the docs (maybe it is there already, but I didn’t see anything about it).

That’s because the compiler we use (FXC) ignores the “h” suffix when it’s dealing with precision information, but doesn’t ignore those explicit conversions.
Yes, I’ll ask someone to update the docs :slight_smile:

1 Like