That’s not correct.
In your example, you sample a texture on one side of the branch, assuming that saving the texture sample is going to be faster. However, with the code as written, the GPU will do both sides of the branch and select the correct result. So your branch is not actually saving any performance in this case.
Why? Well, because GPUs don’t work on a single pixel at a time. What actually happens is that each executing unit always processes a block of 4 pixels at a time so it can properly filter the data. If a polygon edge cuts through that quad, for instance, the GPU may compute 4 texels worth of data but only use one of them (because the other 3 are on the other side). (This is why microtriangles are bad).
If you sample a texture in a branch, you have to do it with the gradient samplers (tex2Dgrad) instead of the regular ones (tex2D) and pass in the derivatives (which you can compute outside of the branch with ddx and ddy). If you don’t do this, then each pixel could produce a different result based on the branch, and the filtering wouldn’t line up between samples, and you’d get artifacts. So if there is a tex2D sample in the brach, the compiler will force both sides to run. If you replace them with tex2Dgrad, then the compiler will decide if it wants to branch there or not. And if you use [UNITY_BRANCH] before the if, it will force the branch to happen (and require tex2Dgrad or tex2Dlod to be used).
So, to enforce that the GPU actually branches, you’d have to change your code to something like:
float dx = ddx(uv);
float dy = ddy(uv);
[UNITY_BRANCH]
if (someval > 0)
{
result += tex2Dgrad(_Tex, uv, dx, dy);
}
else
{
result = 0;
}
The UNITY_BRANCH attribute will tell the GPU that you really do want to do a branch there, and if you don’t use tex2Dgrad, the shader won’t compile because you are telling the compiler to do a real branch.
Now, is this faster? It really depends. Lets imagine that the value we are branching on comes from another texture read.
float someval = tex2D(_Tex2, uv).a;
float dx = ddx(uv);
float dy = ddy(uv);
[UNITY_BRANCH]
if (someval > 0)
{
result += tex2Dgrad(_Tex, uv, dx, dy);
}
else
{
result = 0;
}
In this case, we have two issues. First, the GPU must wait for the value of the first texture sample to finish before it can perform the branch (dependent texture read), and since you’ve forced a branch, it really does have to wait instead of computing both results and choosing one once that data is fetched from the texture. Second, if the values fetched from that texture change a lot, then the very limited branch predictor on a modern GPU will fail to provide any benefit, and for reasons of filtering, both sides may be run in areas where the value is changing. Had that branch been on a uniform (a property, which is the same for every pixel), then the branch predictor would have a 100% hit rate, and no quads would need to compute both sides of the branch to filter the texels correctly.
So here’s my rule of thumbs with if in a shader for modern GPUs:
- Use them to select results at will, but expect both branches to run and do not use the branch attribute
- Use them when you know every branch will go the same way.
- Use them to avoid sampling many textures at once, but always force the branch to happen, and always time the difference in a GPU profiler to make sure it’s an actual speedup
- Avoiding a single texture sample or some math is less likely to be a win, so time things if you think it might be. These types of optimizations are often data and view dependent as well (large texture in high mip close to the camera vs. far away in low mip).
The obviously gets more difficult as you write shaders for a wider range of platforms. Running your shaders through a GPU frame capture, like the one provided in Instruments, is the only real way to know what’s going on.