For example if I want to have a fullscreen background of a starry sky, texture scrolling clouds and texture scrolling mountains, that could be either:
a] 3-5+ full screen sprite renderers each of which contribute to overdraw/fillrate because the sprites are on the transparent queue, or
b] 1 full screen sprite renderer with a shader that samples all the textures and stacks them together in the fragment shader, while still retaining their ability to scroll properly since it’s texture scrolling rather than transform scrolling. With the benefit of only writing to the pixel once instead of 3-5+.
I assume the latter option is more performant, but what is the actual difference? On a modern PC would it matter? How many fullscreen fullrect transparent sprites could you have on screen before performance even matters? I have tried GPU profiling some tests, but can’t seem to manage to figure out how to interpret it properly with regard to fill rate/overdraw, and how to properly set it up so batching or instancing aren’t messing with expected raw results. I feel like it still matters I just don’t know by how much or how to properly measure it. Also please no remarks about premature optimization, it really doesn’t help me understand the rendering process more.
It depends what you mean by modern. For an example, early reviews of Graveyard Keeper complained about performance problems even with beefy computers (e.g. gtx 1060). It’s hard to say from what exactly but from a post I saw from the developer there was a decent amount of overdraw. With all things, you have to test on your lowest-tier target device to make an informed guess. Overdraw, even with a simple shader, will add up and will at least contribute to problems.
An easy way to find out is, on target device, create a very simple build where you can press spacebar to instantiate a new full-screen sprite plus a UI count for the # instantiated and FPS counter.
There are reasons not to stack though, like whether it makes things more complex, whether these objects really would not be overlapping a lot anyway, and whether you otherwise don’t have performance problems.
Yeah I get confused because there are examples of performance complaints with semi-recent indie games that don’t appear like they’d be resource hogs, like Graveyard Keeper as well as Hollow Knight but then I look at Ori and the Blind Forest that appears to have a tremendous amount of overdraw given the amount of glow textures and renderers on the screen using renderdoc compared to similar games, as well as a seemingly much higher texture memory requirement, but is more performant. It could be because I believe they use a lot of transparentcutout instead of transparent, which I think means a lot of their sprites can be rendered front to back instead of back to front, and they use early-z testing to cancel the processing of pixels on a significant amount of renderers partly behind other renderers. Whereas if everything is rendered back to front then every pixel gets rendered resulting in more drastic overdraw. But I don’t fully know.
The Unity 2DRenderer devs appear to be interested in implementing an internal version of this which could be a big deal.
I don’t think it’s a priority for even Unity 2021, but hopefully it gets added eventually.
There’s a Digital Foundry YouTube video on the Switch port of Ori and the Will of the Wisps with tons of details about how the achieved 60fps. One of the key optimizations is performing a front to back depth prepass, drawing only the opaque parts of all 2D elements into the depth buffer. Then everything is rendered back to front with depth testing set to less than or equal, so overdraw is limited only to the pixels which are semi-transparent.