Forward lights break static batching - but why? (Custom Shader / lighting)

If I have a scene with some objects all using the same material and only directional lighting, it gets rendered in one batch, one setPass-call, no problem.

As soon as I add a single point or spotlight, batching breaks completely. I’ve tried finding the reason for this, but all sources I could find basically just say “it just does” or “unity can’t batch multi pass shaders”. Even the forward base pass, which should only be using the directional light, now issues a ton of setPass-calls, where previously only a single one was needed, even though the gpu data stays the same for all these calls. I’ve checked the frame debugger of course, which says “objects are affected by different forward lights” as batch-breaking reason, but why is this a thing? The materials, shader, pass, gpu data, vertices, textures, matrices, everything that could be different to the previous draw call and require a setPass-call is the exact same, so what is there to “set”? I don’t understand. Forward base pass doesn’t even use these additional forward lights.

Forward-add-pass draw calls, of course, don’t get batched either - another thing I can’t wrap my head around. These are additive passes, separate from the base pass, and for the same spot or point light, their draw calls again share all gpu data etc between them. Can’t be batched though, instead every draw call needs a setPass-call, to replace the current render state with seemingly the exact same data.

The funny thing is, I can at least “fix” the broken batching for the forward base pass by splitting my passes into two shaders, one only containing the base pass with only directional light support, and one containing only the forward add pass, then rendering all objects with both shaders. Now, the base pass at least gets batched again with a single setPass-call and isn’t disturbed by the additional forward lights. But this seems like a dumb workaround and also doesn’t fix batching of the forward add pass.

If somebody has some insight to share on this topic, either why this can’t ever work or how it could be made working, I’d appreciate it.

each additional pixel light in forward rendering is drawn in another pass after the fact, adding draw calls etc, meaning that an additional light is like drawing the whole thing again. if your objects were batched, but only some are hit by another light then the objects that all shared the one call are no longer the same to the gpu. you should look into vertex lights as they’re far less damaging to performance, and all in one call.

though i should mention objects with different lights affecting them still won’t be able to be batched.

The reason explained above is why in forward rendering there are a limited number of lights allowed in the first place. Each light is another pass on all geometry that it hits.

This is exactly why Deferred and later Forward+ rendering techniques were invented. If you platform supports it, you should use one of those instead as they pretty much supersede traditional Forward in almost every way. The only time I’d consider Forward is if there is going to be no dynamic per-pixel lighting applied at all.

I’m not worried about the additional passes, in case that wasn’t explained well enough in the op. I’m quite aware that they happen and that they need to happen. My question is, why draw calls within these passes aren’t batched (and even if they can’t be, why they issue so many setPass-calls).

Again, the scenario is a scene with objects all using the same material and static batching. It contains the main directional light and one point light.

What I expect to happen:

  1. The forward base pass for the main directional light is rendered using 1 setPass-call and draw calls for all objects in the scene in a single static batch.
  2. The forward add pass for the point light is rendered using 1 setPass-call and draw calls for all objects affected by the light in a single static batch.

This should result in 2 setPass-calls total, and 2 static batches containing all the necessary draw calls.

What really happens:

  1. The forward base pass for the main directional light is rendered using multiple setPass-calls and draw calls are split up between some static batches and singular draw calls, seemingly without reason.
  2. The forward add pass for the point light is rendered using lots of setPass-calls und draw calls aren’t batched at all.

This results in an enormous increase in setPass-calls and batches over my expected scenario, and I want to know why this happens.

Honestly, that has been a mystery to me for many years but I imagine it has something to do deep down within the engine and how they organize data. Most likely there are some big sacrifices that would have to be made either in general performance or in how flexible the system can be. I vaguely recall reading some old post about it but it was rather handy-wavey and didn’t explain much.

Suffice it to say that I moved to deferred shortly after discovering the issue and never looked back. More recently I’ve been taking advantage of Forward+ in the newer pipelines as it allows for the best of both worlds in many cases (transparency, custom lighting, AND many lights).

Some further reading from the classic GPU Gems might shed some light.

https://developer.nvidia.com/sites/all/modules/custom/gpugems/books/GPUGems/gpugems_ch15.html

as i said earlier each additional light is another pass. a pass in the shader makes a new setpass which then means a new draw must be started to accommodate the different light.

there are also limits to how many vertices can be in a batch, so that can be a factor as well.

feast your eyes

Unity - Manual: DrawCallBatching (unity3d.com)

Interesting, thank you. Funnily enough, the paper only ever mentions batches, implying that having them even with additional forward lights should be the norm (i think?). Anyway, I’ll probably switch to deferred for my opaque geometry as you suggested. I initially thought forward would suffice since I only need 1-2 pixel lights, but not with this mess. I’ve looked at forward+ before, but I’m on birp and sadly don’t think it will be ever be available for that.

Thank you, but I’m already aware of all of this. You’re still missing my actual problem, it never was about the need for additional passes, but rather how batching within these passes is (not) handled. Please read my posts thoroughly.

If it’s full custom shader on BiRP, you can use vertexLight info (in fragment shader) to lit by 4 lights in one pass, instead of add passes.

#pragma multi_compile _ VERTEXLIGHT_ON

float4 unity_4LightPosX0;
float4 unity_4LightPosY0;
float4 unity_4LightPosZ0;
half4 unity_4LightAtten0;
half4 unity_LightColor[8];

struct SubLights
{
    float3 lightVectorWS[4];
    float  distanceSqr[4];
    float  lightAtten[4];
    float3 color[4];
};

SubLights GetSubLights(positionWS)
{
    SubLights output;
    float4 toLightX = unity_4LightPosX0 - positionWS.x;
    float4 toLightY = unity_4LightPosY0 - positionWS.y;
    float4 toLightZ = unity_4LightPosZ0 - positionWS.z;
    float4 distanceSqr = 0.0;
    distanceSqr += toLightX * toLightX;
    distanceSqr += toLightY * toLightY;
    distanceSqr += toLightZ * toLightZ;
    output.lightVectorWS[0] = float3(toLightX.x, toLightY.x, toLightZ.x);
    output.lightVectorWS[1] = float3(toLightX.y, toLightY.y, toLightZ.y);
    output.lightVectorWS[2] = float3(toLightX.z, toLightY.z, toLightZ.z);
    output.lightVectorWS[3] = float3(toLightX.w, toLightY.w, toLightZ.w);
    output.distanceSqr[0] = distanceSqr.x;
    output.distanceSqr[1] = distanceSqr.y;
    output.distanceSqr[2] = distanceSqr.z;
    output.distanceSqr[3] = distanceSqr.w;
    output.lightAtten[0] = unity_4LightAtten0.x;
    output.lightAtten[1] = unity_4LightAtten0.y;
    output.lightAtten[2] = unity_4LightAtten0.z;
    output.lightAtten[3] = unity_4LightAtten0.w;

    output.color[0] = unity_LightColor[0].xyz;
    output.color[1] = unity_LightColor[1].xyz;
    output.color[2] = unity_LightColor[2].xyz;
    output.color[3] = unity_LightColor[3].xyz;

    return output;
}