How bad is an if-case?

I found out that I could add an if-case in my shader to eliminate a graphics glitch when using MSAA. But I’ve read if-cases in shaders are bad for performance, and that it can cause both branches of the shader to be executed or something like that (doesn’t sound very efficient).

So could anyone with some experience with shaders tell me if they think this will cause a noticable performance hit? Or should I just write two versions of the shader and only use this one when MSAA is enabled?

I just worry if code like this will make the game become fragment bound sooner and being wasteful.

Here’s a snippet from the shader with the if-case.

                float zdepth =
                    LinearEyeDepth (
                        tex2Dproj(_CameraDepthTexture, UNITY_PROJ_COORD(i.ref)).r);
                float waterDepthValue = zdepth - i.ref.w;

                if (waterDepthValue < -0.05f) {
                    foam = 0.0f;
                }
                else {
                    float intensityFactor =
                        1.0f - clamp(waterDepthValue / _ShoreDistance, 0.0f, 1.0f);
                    half3 foamGradient = _ShoreStrength -
                        tex2D(_FoamGradient,
                            float2(intensityFactor - i.bumpTexCoord.w, 0)
                            + tangentNormal.xy);
                    foam += foamGradient * intensityFactor * _foam;
                }

The rule of thumb I go by is if an if statement saves you 10 instructions, it’s worth while.

If you’re running on mobile and targeting GLES 2.0, then the if won’t save you anything as both sides do get executed and the above basically ends up doing this:

                float zdepth =
                    LinearEyeDepth (
                        tex2Dproj(_CameraDepthTexture, UNITY_PROJ_COORD(i.ref)).r);
                float waterDepthValue = zdepth - i.ref.w;

                float intensityFactor =
                    1.0f - clamp(waterDepthValue / _ShoreDistance, 0.0f, 1.0f);
                half3 foamGradient = _ShoreStrength -
                    tex2D(_FoamGradient,
                        float2(intensityFactor - i.bumpTexCoord.w, 0)
                        + tangentNormal.xy);
                foam += foamGradient * intensityFactor * _foam;
               
                if (waterDepthValue < -0.05f) {
                    foam = 0.0f;
                }

If you’re worried about it you can absolutely use #pragma multi_compile _ MSAA_FIXUP and an #if defined(MSAA_FIXUP) to skip the code. But I wouldn’t worry about it too much.

1 Like

Thanks! That’s good to know, I will just keep it as is with the if-case for now then. It’s for PC only.

That’s not correct.

In your example, you sample a texture on one side of the branch, assuming that saving the texture sample is going to be faster. However, with the code as written, the GPU will do both sides of the branch and select the correct result. So your branch is not actually saving any performance in this case.

Why? Well, because GPUs don’t work on a single pixel at a time. What actually happens is that each executing unit always processes a block of 4 pixels at a time so it can properly filter the data. If a polygon edge cuts through that quad, for instance, the GPU may compute 4 texels worth of data but only use one of them (because the other 3 are on the other side). (This is why microtriangles are bad).

If you sample a texture in a branch, you have to do it with the gradient samplers (tex2Dgrad) instead of the regular ones (tex2D) and pass in the derivatives (which you can compute outside of the branch with ddx and ddy). If you don’t do this, then each pixel could produce a different result based on the branch, and the filtering wouldn’t line up between samples, and you’d get artifacts. So if there is a tex2D sample in the brach, the compiler will force both sides to run. If you replace them with tex2Dgrad, then the compiler will decide if it wants to branch there or not. And if you use [UNITY_BRANCH] before the if, it will force the branch to happen (and require tex2Dgrad or tex2Dlod to be used).

So, to enforce that the GPU actually branches, you’d have to change your code to something like:

float dx = ddx(uv);
float dy = ddy(uv);

[UNITY_BRANCH]
if (someval > 0)
{
    result += tex2Dgrad(_Tex, uv, dx, dy);
}
else
{
    result = 0;
}

The UNITY_BRANCH attribute will tell the GPU that you really do want to do a branch there, and if you don’t use tex2Dgrad, the shader won’t compile because you are telling the compiler to do a real branch.

Now, is this faster? It really depends. Lets imagine that the value we are branching on comes from another texture read.

float someval = tex2D(_Tex2, uv).a;
float dx = ddx(uv);
float dy = ddy(uv);

[UNITY_BRANCH]
if (someval > 0)
{
    result += tex2Dgrad(_Tex, uv, dx, dy);
}
else
{
    result = 0;
}

In this case, we have two issues. First, the GPU must wait for the value of the first texture sample to finish before it can perform the branch (dependent texture read), and since you’ve forced a branch, it really does have to wait instead of computing both results and choosing one once that data is fetched from the texture. Second, if the values fetched from that texture change a lot, then the very limited branch predictor on a modern GPU will fail to provide any benefit, and for reasons of filtering, both sides may be run in areas where the value is changing. Had that branch been on a uniform (a property, which is the same for every pixel), then the branch predictor would have a 100% hit rate, and no quads would need to compute both sides of the branch to filter the texels correctly.

So here’s my rule of thumbs with if in a shader for modern GPUs:

  • Use them to select results at will, but expect both branches to run and do not use the branch attribute
  • Use them when you know every branch will go the same way.
  • Use them to avoid sampling many textures at once, but always force the branch to happen, and always time the difference in a GPU profiler to make sure it’s an actual speedup
  • Avoiding a single texture sample or some math is less likely to be a win, so time things if you think it might be. These types of optimizations are often data and view dependent as well (large texture in high mip close to the camera vs. far away in low mip).

The obviously gets more difficult as you write shaders for a wider range of platforms. Running your shaders through a GPU frame capture, like the one provided in Instruments, is the only real way to know what’s going on.

2 Likes

Yep, I overlooked the texture sample in the if there. That will indeed cause it to act like my GLES 2.0 example even on desktop GPUs as @jbooth_1 describes.

TLDR; Understanding the performance impact of if statements in shaders is complicated, don’t be afraid to use them, but also don’t expect them to make things any faster.

GPU profilers are your best tool when trying to understand shader performance. I have come across many situations where the results were not what I was expecting, sometimes drastically so.