Weird observation about additive blending overdraw/fill rate

I am rendering a ton of large, layered transparent sheets using additive blending. The sheets are all part of a single combined mesh. I am using a custom pixel/vertex shader (not a surface shader).

When they are lit up, performance is through the floor – it’s an obvious fill rate/overdraw issue. But I am not noticing any performance drops when they are black (ie invisible).

My question is, shouldn’t there be performance issues when the objects are invisible, too? Shouldn’t they be going through the exact same pipeline – vertex shader, pixel shader, etc. that result in the same performance no matter the color of the object? What is the mechanism through which this is being optimized?

It’s probably the GPU itself that skips pixels that have no effect on the final result.

But by the time the pixel has been determined to not contribute to the image, hasn’t all the work already been done? Is the actual additive blending with the frame buffer really that expensive?

Not that expensive, but still work. Makes sense to optimize 0 x A + 1 x B to just B.

Think about it like this. You have 100 additive quad sprites in your scene. Regardless of where they are, or how big they are, the cost is the same on the CPU to calculate their positions and send to the GOU and then for the vertex shader to work on 400 vertices. However the pixel shader cost for 100 sprites that are 1 pixel in size is just the cost of 100 pixel shader runs, which is super cheap. If all of those are full screen then the cost is your resolution x 100. If the GPU can be smart and early reject before getting to the pixel shader that’s a lot less to do.

Right, but the key there is if the GPU can be smart and reject before getting to the pixel shader.

But in order to decide that the pixel is black (ie. whether it can reject that pixel), it still needs to run the pixel shader, so the only work that’s being saved is blending the pixel with the framebuffer.

I originally had several multiplies and adds, along with a texture sample, in my pixel shader (I’ve since moved most of it to the vertex shader and now only have one texture sample and one multiply), and I was still seeing the same behavior. I would have assumed that additive blending would be somewhere between the cost of (a) one add + one texture sample and (b) one add, since I would have assumed that sampling from the framebuffer would be cheaper than an ordinary texture register.

However, it seems that is not the case at all, and that there is some lurking cost of blending with the framebuffer that I can’t quite figure out.

Am I going crazy???

Modern GPUs are actually very smart about this kind of thing. There’s lots of stuff they do before the fragment shader is run and after the vertex shader to try to speed up rendering. It’s not outside reason to think they would reject fragments with material properties the GPU knows will cause the pixel to be transparent. They already reject pixels because of z depth occlusion or stencil bit tests. A stack of 100 opaque objects render faster because of this; after the first and presumably nearest object is rendered everything behind it is rejected before the fragment is run.

In the case of a black pixel it’s not really that the GPU is really all that smart, it’s that the shader has been run before its being rendered. When shaders are sent to the GPU they get analyzed and often times even modified by the drivers to optimize them. That analysis might give them a list of values they can easily test against to reject fragment prior to rendering. For simple shaders that might seem like a waste, but there’s a cost to reading or writing to the frame buffer and it’s likely greater than the cost of a few hard coded comparison checks per poly.

You could probably do some experiments to test this. Assuming your pixel shader takes a material property (color) and multiplies it by a texture sample and then outputs it: The running theory is that the GPU can see that if the material property is black, then no matter what happens in the pixel shader the output will be (0, 0, 0, 0), and it can skip the shader entirely. Instead, try using an all black texture and a non-black color and see what happens. I doubt the GPU could optimize that out.

If that makes up your difference in performance (i.e. suddenly stuff gets slow again), then it would seem the GPU is being super smart.

If there’s no difference, then it seems that the blending must be the performance bottleneck. Blending does require reading from the output buffer, I suppose that could be a significant hit on some GPUs - have you done any more detailed profiling to calculate the actual difference in GPU frame rendering time between the two scenarios?

The blending part is separate from the shader part of the GPU. It also has a specific maximum speed, so it can indeed be the bottleneck. By using a simple shader, no z-check and large triangles, it’s not that hard to put the bottleneck at the blending stage.