Performance test of selected HLSL intrinsic functions - results

List of all functions:

Resolution: 8K
Unity 2022.3.11
RTX 2070 + Windows 10

Test shader:

Shader "Performance"
{
    SubShader
    {
        Pass
        {
            HLSLPROGRAM
            #include "UnityCG.cginc"
            sampler2D _MainTex;
           
            #pragma vertex VSMain
            float4 VSMain (float3 vertex : POSITION) : SV_Position
            {
                return UnityObjectToClipPos(vertex);
            }

            #pragma fragment PSMain
            float4 PSMain (float4 vertex : SV_POSITION) : SV_Target
            {
                float r = 0;
                for (float i = 0.0; i < 10000.0; i++) r = r + abs(i);
                return float4(r, 0.0, 0.0, 1.0);
            }

            ENDHLSL
        }
    }
}

Replace line 21 with selected function:

abort(); for (float i = 0.0; i < 10000.0; i++) r = r + asin(i);
for (float i = 0.0; i < 10000.0; i++) r = r + abs(i);
for (float i = 0.0; i < 10000.0; i++) r = r + acos(i);
for (int i = 0; i < 10000; i++) r = r + asfloat(i);
for (float i = 0.0; i < 10000.0; i++) r = r + atan(i);
for (float i = 0.0; i < 10000.0; i++) r = r + ceil(i);
for (float i = 0.0; i < 10000.0; i++) r = r + cos(i);
for (float i = 0.0; i < 10000.0; i++) r = r + cosh(i);
for (float i = 0.0; i < 10000.0; i++) r = r + ddx(i);
for (float i = 0.0; i < 10000.0; i++) r = r + degrees(i);
discard; for (float i = 0.0; i < 10000.0; i++) r = r + asin(i);
for (float i = 0.0; i < 10000.0; i++) r = r + fmod(i, 123.456);
for (float i = 0.0; i < 10000.0; i++) r = r + log(i);
for (float i = 0.0; i < 10000.0; i++) r = r + mul(unity_ObjectToWorld, vertex).r;
for (float i = 0.0; i < 10000.0; i++) r = r + normalize(i);
for (float i = 0.0; i < 10000.0; i++) r = r + normalize(vertex);
for (float i = 0.0; i < 10000.0; i++) r = r + rsqrt(i);
for (float i = 0.0; i < 10000.0; i++) r = r + sign(i);
for (float i = 0.0; i < 10000.0; i++) r = r + tan(i);
for (float i = 0.0; i < 10000.0; i++) r = r + tex2D(_MainTex, vertex.xy);

Results → numbers = FPS, higher is better;

The most demanding HLSL functions are inverse trigonometric functions: asin, acos, atan

6 Likes

Interesting experiment. Yes, inverse transcendentals are known to be the most expensive.

I didn’t expect normalize to be so expensive, though. That’s such a common function, I would have thought, they’d optimized it with a lookup table. What does normalize(scalar) even do? Does that always return 1? Also, you need to be careful with normalize(vertex) because a good compiler would simply pull that out of the loop.

Would also be interesting to see the cost of a division. Might be similar to the cost of fmod.

How did you measure the cost of discard? That will end the shader as soon as you call it.

Well, the fact that mul being magnitude times slower than rsqrt is enough to conclude this benchmark is broken.
Respect the effort though, it is hard do measure any kind of workload on gpu.

I’m sure that would be petabyte size look-up table.

You are right, unfortunately. I haven’t even noticed that.

… or maybe a Taylor series approximation. There seems to be at least a Special Function Unit (SFU) for certain operations.

It is also worth pointing out that the GPU can process multiple instructions at the same time. GPUs can have scalar and vector units as well as special commands such as fused-multiply-and-add (fma). Latency can also often be hidden by running another shader at the same time (this is where occupancy comes into play).

It would still be interesting to know the general ballpark cost of the instructions. It is hard to find good information on that.

There is shadergraph heat colors on 2023.3 and up : https://discussions.unity.com/t/931489

1 Like

Oh wow, I didn’t know this. So Unity has already done this work for us. Here are the standard values that Unity has measured.

1 Like

By the way, NVidia’s Shader Profiler can give you an exact breakdown of the cost, but I’ve never gotten it to work when I needed it.

9755089--1396708--upload_2024-4-7_8-2-25.jpeg

3 Likes