Optimizing PBR vertex shader

I’m looking for some help trying to optimize my PBR vertex shader. Right now in metal it gets optimized down to 4 matrix multiplies. I am trying to see if there is a way I can get this down, either by just reducing some calculations, or changing how the fragment shader works to operate in a different space or something.

The 4 matrix multiples are 2 to get object positions into clip space, and 1 for both tangent and normals from object space into world space.

One thing right off the bat is it seems that I should just be able to use only 1 matrix for object to clip space right? At least I would think so. But in the compiled shader it first gets multiplied by ObjectToWorld, then by the VP matrix.

I was also thinking I could change the frag shader to operate in tangent space… but then I would still have to convert the light direction (am only using 1 dir light) and the view direction into tangent space, so would be same number of matrix multiplies I think…

Another idea was seeing if I could simplify it at all by assuming uniform scale, but also doesn’t seem like that would actually change the number or complexity of calculations required.

Any help would be very much appreciated!

#define CUSTOM_LIGHTING_VERTEX(v, o) o.vertex = UnityObjectToClipPos(v.vertex); \
o.worldPosition = mul(unity_ObjectToWorld, v.vertex); \
o.normal = UnityObjectToWorldNormal(v.normal);/*mul(unity_ObjectToWorld, half4(v.normal, 0));*/ \
o.uv = v.uv; \
calculateTSpace(o.normal, v.tangent, o.tspace0, o.tspace1, o.tspace2); \
o.uv2.xy = v.uv2.xy * unity_LightmapST.xy + unity_LightmapST.zw; \
o.fogCoord = o.vertex.zw;

void calculateTSpace(half3 worldNormal, half4 vTangent, out half3 tspace0, out half3 tspace1, out half3 tspace2)
{

#if defined(_NORMALMAP) || defined(_DETAIL_MULX2)
half3 wTangent = UnityObjectToWorldDir(vTangent.xyz);
half tangentSign = vTangent.w * unity_WorldTransformParams.w;
half3 wBitangent = cross(worldNormal, wTangent) * tangentSign;
tspace0 = half3(wTangent.x, wBitangent.x, worldNormal.x);
tspace1 = half3(wTangent.y, wBitangent.y, worldNormal.y);
tspace2 = half3(wTangent.z, wBitangent.z, worldNormal.z);
#else
tspace0 = half3(0, 0, 0);
tspace1 = half3(0, 0, 0);
tspace2 = half3(0, 0, 0);
#endif

}

That’s already essentially the bare bones. There isn’t really anything else you can optimize away of significance without breaking things or likely making the fragment shader slower than the savings you might get.

I’d say you’re focusing on optimizing the wrong thing. If vertex shader time is a concern, you should be looking at reducing your vertex count.

1 Like

Fair enough. We decided to spend a bit of time optimizing vertex counts, and also we’ve been able to do some lower level optimizations on the shaders that made a significant enough impact on perf.