I’ve been working on an idea for 2D games that suffer from large amounts of overdraw, and I have a working “solution” but I’m wondering if it’s a road other people have travelled (and abandoned) before, so I thought I’d just throw it on here and see if anybody has any ideas.
The base problem is our game uses various different 2D and 3D cameras, all of which render predominantly alpha-blended sprites (with soft edges, but often also with large opaque sections). Because we use a 3D camera for parallax, we have a serious problem at the horizon where often enough everything overlaps.
Enter the solution: the foreground (2D) is also often covering most of the screen at this same horizon, so why bother rendering most of the background at all?
The first working setup I have works like this:
Render foreground to RenderTexture
Generate “did I render anything opaque” mask from this
Use that mask to set stencil values at the start of the background camera’s render
Mask out any fragments that have a pre-set stencil value
Blit the foreground RT after background has been rendered, achieving (mostly) correct transparency composition by adding GrabPass based on inverse-alpha of foreground RT (requires specific settings for foreground alpha/additive sprites)
Questions I’m working on:
Where doesn’t this work? (Android seems to be a problem, iOS works fine)
How does it compare performance-wise to just rendering overdraw? (ergo, when is it faster?)
What can improved to speed up the whole process?
Has anybody tried something like this before? Right now I’m thinking about extending this to all cameras, and effectively drawing them all front-to-back with no overdraw between cameras (still happens within cameras, obviously).
I’ve attached a GIF of a frame I recorded recently (also contains things like lighting cameras, depth renders, and post effects):
If you are going to do this every frame, there is no point, because what you are trying to do with a stencil buffer is what the Z-buffer already does for you automatically. If the objects are drawn from front to back with proper z depths, the z-buffer will automatically filter out the pixels that don’t need to be drawn. However, both with the z-buffer and with your method, you don’t avoid having some testing done to determine if the pixels should be drawn. Depending on the graphics card, this may be more or less optimized.
If you can do some of your operations once (such as rendering the background into a rendertexture) and then reusing that every frame, you might improve your performance, but if you are doing all this every frame, I think you are actually making performance worse with all your stencil-making and GrabPass stuff.
I think especially GrabPass is bad, because it forces tilebased renderers (common on Mobile) to finish rendering the tiles before GrabPass can execute.
I would suggest simplifying your rendering again, and then start to measure how performance changes when you switch features on and off in your scene. For example I would be suspicious about the performance of any posteffects on mobile.
I’m not sure the z-buffer would work since it’s essentially all Blend SrcAlpha OneMinusSrcAlpha (and pretty much all of them don’t ZWrite iirc), ergo it needs to render them all anyway (and back-to-front). I think I initially had this idea exactly because the z-buffer didn’t help me in any way, as it would for opaque geometry.
Obviously the additional cost of the GrabPass + Stencil operation needs to outweigh the actual render cost of the discarded pixels (which it might, because in our case we do a kind of deferred-style lighting in a forward pipeline), so I still need to run some more in-depth tests to see if this actually yield any benefits.
One common solution to this is to separate opaque elements (center) from the transparent edges. Especially for the bigger background shapes this might be a good way to go about it. Render the opaque centers of the things first without any blending, utilizing the z-buffer. Then render the background, avoids most of the overdraw. Then render the transparent edges afterwards with blending. Something like this could work in your case.
To add to this a common mistake is to use an alpha tested version of the sprite to render to depth, and this does work, but alpha test really slow on most mobile devices (basically anything not running on a Nvidia Tegra). You really want real low poly shapes to render to the depth with.
I’ve looked into rendering the edges separately, but I’m not sure how to accomplish this without splitting the meshes up (which probably means extending the SpriteRenderer component, which I don’t really feel up to). I know UE4 supports this out of the box, but I haven’t seen anything from Unity on this.
I would probably forgo SpriteRenderer in this case, and use mesh renderer with 2 submeshes: one for the opaque part and one for the edges. Takes more effort, sure, but doesn’t optimization always?
Check out SpriteSharp, IIRC it has support for this. Haven’t tried that aspect of it myself, though.
SpriteSharp looks like a great package. I’m attempting to integrate it now. Will post results here once I’ve done this.
EDIT:
So far the heaviest scene (where the above GIF is also from) seems to be noticably faster (the overdraw was about the same fps as the stencil method) on Xbox One (still need to test on iOS/Android). I’ve updated all the background + foreground sprites, and I wrote an editor script that finds SpriteRenderers on these layers, checks if they have a corresponding “(Alpha)” sprite, and adds these as a child (and swaps around some materials etc.)
EDIT 2:
Funny thing is on iPad 3 at non-retina resolution, average framerate has gone down as a result of this change, but overall heat and battery usage has improved tremendously. I’m guessing its now cpu bound on that particular device. Still need to test on devices with newer cpu’s.