How to determine when it is worth it to move calculations to vertex shader?

Is it always better to move as much calculation as possible into the vertex shader? Is it worth it for something as simple as: o.color = _Color1 * _Color2;How do you determine the cost-benefit?

For modern desktop GPUs, almost never.

For mobile GPUs, it depends on the GPU, but also almost never.

Doing more in the vertex shader always being faster than doing it in the fragment shader ceased to be true for desktop class GPUs over 15 years ago. The only time it’s worthwhile to do it in the vertex shader is if you’re not passing any more data than you were already to do so. For example, if you’re making use of the mesh’s per vertex color value, and you have a material property color, then there can be a benefit to multiplying the vertex color in the vertex shader to the fragment shader. This might have a noticeable effect on performance for mobile, but negligible for desktop.

By trying it and benchmarking. Specifically benchmarking by looking at the GPU performance using external tools. The Unity Editor’s GPU Profiling is also fairly decent, at least for in editor profiling. But external tools will be more accurate.

3 Likes

A little bit of extra data here to give you an idea of how “almost never” this is.

Around the late 2000’s, some graphics programmers figured out that each extra float4 worth of data passed from the vertex shader to the fragment was the equivalent to the cost of somewhere around 6 or more instructions in the fragment shader. For desktop GPUs that’s only gotten worse since then as memory bandwidth improvements continue to lag behind raw computation speed.

That “late 2000’s” thing they were doing that lead them to notice this was … normal mapping. Traditional normal mapping usually requires passing the per vertex normal, tangent, and bitangent vectors from the vertex to the fragment. However you can calculate the tangent and bitangent in the fragment shader if you have the UV and world position with some “expensive” (by late 2000’s standards) math and pixel derivatives. So a few people asked “might it be faster to do that instead”? And the answer was … yes, it was. Or at least it wasn’t any slower. So several PS3 era games did just that. It also meant you didn’t need meshes to have tangents, which means you didn’t need to store it or even calculate it beforehand, nor did you have to pay the extra cost of updating it for things like skinned meshes.

So comparing a shader that passed a float2 UV and three float3 vectors (a normal, tangent, and bitangent) to one that passed a float2 UV, and two float3 vectors (normal and position) and did a lot more math in the shader … the later was slightly faster. In 2009. Even faster in comparison if the shader already needed the position data for other reasons meaning you were removing two float3 vectors worth of data from the original example.

Testing today with a similar setup where I’m doing the equivalent of Unity’s BIRP normal mapping and having the fragment shader just output the resulting world space normal vs one using an even more expensive way of calculating the tangent in the fragment shader, I cannot measure a meaningful difference in performance between the two. Even though the more expensive fragment shader has 22 more instructions (12 vs 34), even when I have the fragment shaders of both setup in a way so that they’re both exactly the same. That’s kind of the current state of desktop GPUs, that increasing the per fragment instruction count by 22 instructions can’t change the needle for how long it takes to draw the mesh, because everything else involved in rendering takes most of the time.

7 Likes