How smart is the compiler?

As far as I know the shader compiler is able to optimize the code like strip out unnecessary codes or looking for predictable results etc, but to what extend?
Consider the following two vertex shaders:

v2f o;
o.color = mul(unity_ObjectToWorld, v.vertex);
o.vertex = UnityObjectToClipPos(v.vertex); //<--- Calculate from object space
return o;
v2f o;
o.color = mul(unity_ObjectToWorld, v.vertex);
o.vertex = UnityWorldToClipPos(o.color); //<--- Starts from world space calculated
return o;

The two shaders both calculate world space position, but the first one uses UnityObjectToClipPos() which has world space calculation in it, where as the second one directly uses UnityWorldToClipPos(). I would expect the compiler to detect the repeated pattern in the first shader and optimize it to make it look something like the second one. However, by looking at the compiled code the second one definitely has less instructions then the first one. So my question is to what extend can I rely on the compiler and is there a general rule of thumb in terms for writing optimized shader code?

Compilers are relatively smart and will remove duplicate code that is exactly the same.

The issue with your first example and second is that they are not exactly the same. Specifically, the first example is calculating the world space position in two different ways. Lets look at how that UnityObjectToClipPos() function calculates the world position.

// Tranforms position from object to homogenous space
inline float4 UnityObjectToClipPos(in float3 pos)
{
    // More efficient than computing M*VP matrix product
    return mul(UNITY_MATRIX_VP, mul(unity_ObjectToWorld, float4(pos, 1.0)));
}
inline float4 UnityObjectToClipPos(float4 pos) // overload for float4; avoids "implicit truncation" warning for existing shaders
{
    return UnityObjectToClipPos(pos.xyz);
}

The important line is this one: mul(unity_ObjectToWorld, float4(pos, 1.0))
Your example uses mul(unity_ObjectToWorld, v.vertex), and while the v.vertex.w is always set to 1.0 by Unity, the compiler can’t know that. So the first example is really doing:

o.color = mul(unity_ObjectToWorld, v.vertex);
float4 worldPos = mul(unity_ObjectToWorld, float4(v.vertex.xyz, 1.0));
o.vertex = mul(UNITY_MATRIX_VP, worldPos);

There is still some overlap, which the compiler is doing it’s best to optimize down, which is why the compiled shader shows only 1 more math operation instead of many more.

If you update the first example to:

o.color = mul(unity_ObjectToWorld, float4(v.vertex.xyz, 1.0));
o.vertex = UnityWorldToClipPos(o.color);

The two options should be he same number of math operations … or possibly be even faster.

1 Like