Reading through shader code in URP, I noticed that in many places divisions are written as
float result = dividend * rcp(divisor);
I understand that this is supposedly faster:
rcp
Calculates a fast, approximate, per-component reciprocal.
However, in some quick tests, I cannot see any difference in the compiled code.
fixed4 frag(v2f i) : SV_Target
{
float a = rcp(i.a.x);
return fixed4(a, 1, 2, 3);
}
fixed4 frag(v2f i) : SV_Target
{
float a = 1.0 / i.a.x;
return fixed4(a, 1, 2, 3);
}
Both of the above compiles to the following:
ps_4_0
dcl_input_ps linear v1.x
dcl_output o0.xyzw
0: div o0.x, l(1.000000, 1.000000, 1.000000, 1.000000), v1.x
1: mov o0.yzw, l(0,1.000000,2.000000,3.000000)
2: ret
Under what circumstances does rcp make a difference?