Unity has always had an issue with anti-aliasing and directional light shadows, specifically that they don’t seem to respect the anti-aliasing and undo the work the scene anti-aliasing is attempting to do. The reason for this is fairly straightforward once you understand it but the ways to fix the issue are not really obvious. Directional shadows are rendered out in a full screen buffer using the scene depth, and the scene depth is not anti-aliased, so the resulting scene shadows aren’t either.
That said I came up with a hacky solution that with some more massaging might be made useful so I’m putting this here to see if anyone else wants to take a gander at it.
The below is a forward rendered scene with 8x MSAA. Note the edges against the skybox are smooth, but the shadow edges are chunky in the top image, but everything is smooth in the bottom one.
Now a valid question might be “why not just use a post process AA”? The answer to that is post process AA techniques don’t address temporal aliasing (flickering of pixels from one frame to another) and that’s the kind of aliasing that VR has the biggest problem with. The flicker is still apparent even if the edge is blurred.
This is just a proof of concept and not optimized or even really usable code. The shader is essentially an unlit shader that samples the screen space shadows.
I haven’t done enough testing to gauge testing; in my test scene there’s no perceivable difference in performance, maybe ~0.2ms @ 1920x1080, but this scene isn’t exactly indicative of a real game scene and the additional depth samples this shader does is going to start to cost more in a scene that actually uses textures. How much more I couldn’t say without extra testing. The shader is doing up to 5 depth samples for each fragment sample. With 4x MSAA this means an additional 1 texture sample to up to worst case of 20 additional texture samples per pixel, with 8x MSAA this goes up to 40, but it’ll be exceedingly rare as every coverage sample of the MSAA would have to be of a different polygon. It’s plausible fo complex geometry with 4x MSAA, but if you’re hitting that with 8x MSAA I would say you need to rethink your mesh density.
1.0 / 10000.0 is because that form is a little easier to tweak and when the shader is compiled it store the number as 0.00001 for me since both values are constant. That is basically a dumb magic number anyway and needs some work to be a slightly less dumb magic number.
I’m doing the uv offsets in the pixel shader because this is just a prototype and because isn’t intended for mobile. I’ll explain.
Most of the time people think you should do as much work in the vertex shader as possible because there are fewer vertices than pixels and the cost of the math will be reduced. This is true, but there’s a cost to moving data from the vertex shader to the pixel shader that most people don’t realize and GPUs are really fast at calculations. The savings of calculating something fewer times might be lost if it’s a lot of data or not a lot of calculation. In this case it’s transferring 10 float values vs the cost of 4 multiplies. With some optimization that could be 8 floats vs 2 multiplies. Either way doing 4 multiplies is nearly free, but 10 floats are not. If there was two or three times as much math involved it might start to make sense.
On mobile calculating UVs in the vertex shader specifically is still a huge win as it means the gpu can cache the texture values before running the fragment shader and the performance of the data transfer to calculation speed isn’t quite as far apart as it is on the desktop. So if I was writing this for a mobile device I would be doing a lot more in the vertex shader.
After posting this I realized this technique has a lot of similarities to Inferred Rendering which made me realize it could be abused to limited shadowing on transparency. I have this working with AlphaToMask right now, but only one layer at a time.
this is not correct: in MSAA fragment shader executed only once for all subsamples inside a pixel, so number of executed fragment functions is always the same as if no MSAA applied
While you are correct that MSAA only runs the fragment shader once per pixel and in the best case it is identical to no MSAA. But it’s once per pixel per triangle, and a different triangle can be rendered per subsample. So for 4x MSAA up to four fragment shaders may be executed per pixel. And that’s ignoring potential overdraw.
For example, if the subpixel samples all hit a different triangle, like in the case of a vertex shared by 4 triangles being right at the center of the pixel, then the fragment shader will be executed 4 times. 4 evocations * 5 depth texture samples = 20 additional texture samples.
true. Though it affects the overall number of pixel shader invocations slightly on average scene. It’s misleading to assume worst case to assess performance implications of MSAA
And I’m not assuming the worst, I simply stating the possible range. I even said the worst case is exceedingly rare. Plus there are plenty of people new to real time rendering importing their high res models directly into Unity so it’s probably less rare than we would hope.
Though in a lot of ways it is always much worse than most people expect, even when they understand it only runs on tri edges. If a single subsample in a pixel quad is on a tri, then all 4 pixels in the quad have to run the shader. My “worse case” estimate is actually a low ball. The worst case for 4x MSAA is actually 16 shader evocations per pixel, assuming best case overdraw and no transparency. It’s easily worse than this with complex models due to overdraw too, even with out the perfect tri corner case. Mobile and Nvidia GPUs save a bit by being tiled-based reducing the impact of overdraw so that’s less of a concern.
There were several talks near the end of the Xbox 360 lifespan on using MSAA with deferred rendering that had some good “holy sh*t!” images when they visualized how many pixels the GPU was multi-sampling.
This is from an old Nvidia example with the top being all tri edges the GPU would be multi-sampling vs a custom detection method based on depth discontinuities (which is closer to what most humans would likely expect). And I think you can agree that that boxy corner of Sponza is far less geometric detail than most games would have.
wait a second, I didn’t get it… This is how I think it works, please correct me if I wrong: let’s assume we are in MSAAx4 mode, no overdraw on our scene, no transparency, no depth-writes in shader, no alpha-to-coverage and we are talking exclusively about fragment shader further on.
if a pixel quad is fully inside a triangle, then all 4 subsamples share the same color data which is gathered from one shader invocation and this invocation is for exact pixel center position and does not correlate to any subsample positions. In case if pixel quad intersects the triangle the procedure is the same for one triangle: shader invoked only once, again for exact pixel center position but this time the bitmask to write into MSAA-texture is calculated by checking which of the four subsamples are inside the triangle. In both cases for one pixel the result of one shader invocation and a (coverage) bitmask is passed from fragment shader stage and the hardware knows how to interpret the result and writes (after depthtest, see below) the same color to all texture subsamples with corresponding bit in bitmask set to 1.
Now let’s look at depth: during the rasterization stage depth values for all four subsamples are calculated based on the triangle’s plane and subsamples positions. All four values do participate in four different depth-tests and the depth-test result (bitmask) is combined with coverage bitmask to decide whether to write color from shader invocation to texture subsample color data or not. (of course there is an early-out if during depth test stage no one of all 4 tests passed - in such a case no shader invocations occurs).
TLDR: for one pixel and one triangle the hardware in any case executes pixel shader once (pixel center coordinates), computes depth for all 4 subsamples in rasterization stage, performs 4 depth tests, computes coverage bitmask in rasterization stage (4 point-triangle-intersection tests), combines depth- and coverage-bitmasks - and this is it. One color vector, 4-bit bitmask and four depth values are passed to texture write hardware (if we do not take into account hardware data compression)
In case the pixel is on the edge of any triangle - it will be processed the same number of times as how many triangles its quad intersects. At maximum only 4 times we’ll go further than depth test stage - so at maximum only 4 shader invocations will occur.
Now let’s take into account that most of GPUs process not individual pixels but 2x2 tiles of pixels at once. So even if one subsample out of 16 in this tile is inside a triangle - all 4 pixels will invoke a shader program, but only one result will be used… but this is not so different from no-MSAA mode where the same rules applied - furthermore all 4 shader invocations are performed in parallel so can not be considered as “3 wasted-performance invocations”.
Of course the worst-case 4x multiplier will go up in case of overdraw regions, alpha-to-coverage mode, transparency, explicit depth calculations in fragment shader (in such a case by the way all four depth values are the same for pixel’s subsamples and MSAA does not help with aliasing in any way) which disables early-Z-out.
Note that when I say “pixel quad” I’m explicitly referring to the 2x2 pixel tiles. Otherwise, yes. Single invocation per pixel of the shader at the center of each pixel if all subsamples in the quad are the same triangle. Whether or not that correlates to a subsample depends on the implementation, but generally for 4x it’s the rotated grid / 4 rooks pattern in which case yes, there is no correlation.
But apart from terminology, I think we’re in agreement.
As you said, GPUs work in 2x2 tiles of pixels (which I called a quad). With 4x MSAA each of those pixels has 4 subsamples making 16 possible subsamples. If any one of those subsamples is a triangle, all 4 pixels render the fragment shader for that triangle. That makes the worst case 16 fragment shader invocations per pixel in the unlikely chance of all 16 subsamples in a 2x2 tile of pixels are each sampling a unique triangle. You are also correct that with out MSAA the same rules for the 2x2 tiles exist, so it’s possible for 4 fragment shader invocations per pixel to occur with MSAA disabled.
Your contention seems to be that:
A) I am counting those additional invocations due to the 2x2 tile, which I’ll admit is making things more confusing than it needs to be, though technically correct. 4x MSAA is not 16x more shader invocations.
B) Am bringing up the worst case at all since it is so exceptionally rare. However if you have something as simple as the default Unity sphere mesh small enough on screen to only be a few pixels wide, this will absolutely be the case as every subsample is likely to be of a different triangle since the individual triangles are significantly smaller than a single pixel. I see this all of the time though where people take the sphere and use it to put a dot on screen because they don’t know any better or it got left there from prototyping. The last game I shipped we at one point had berry bushes with 1600 poly spheres for each of the 8 berries on the otherwise <100 poly bush. So I contend this case happens far more often than one might expect.
Even in AAA games there is a great story of a game’s framerate & memory budget suddenly having problems and it being tracked down to a box of bullets … where each bullet has it’s own unique set of 2k textures and a 30k mesh and the entire box is actually filled with 30 bullets, not just the top few that are visible…
Sadly, no. It did in some early versions, and then it didn’t by the time non-beta builds of Unity were required to run it. Well, that’s not entirely true. As of LWRP 2.0.4 if you disable shadow cascades the main directional light shadows are sampled in the forward pass instead of the “Screenspace shadow resolve” as they call it in the change notes. https://docs.unity3d.com/Packages/com.unity.render-pipelines.lightweight@2.0/changelog/CHANGELOG.html
For mobile platforms, Unity’s forward renderer has never used the screen space shadows, and in some versions of Unity it was possible to disable this even for desktop / console builds, and by some versions I don’t mean major versions, but Unity 5.3 you could disable screen space shadows from the quality settings (using a setting not exposed to the inspector), but in 5.4 it was removed, and in some versions afterward disabling the screen space shadow shader would cause it to fall back to sampling the shadows in the forward pass, and in others it would just stop rendering the shadows entirely. For all of these cases only hard shadows were an option.
With LWRP 2.0.4 you can still use soft shadows for non-screen space shadows, which is a nice improvement over the built in forward renderer, but there are some issues still. Since you can’t use cascades for non-screen space shadows you generally need to set the shadow distance fairly small, and currently there is a hard edge where the shadow map ends rather than a soft fade like the built in renderer. When using soft shadows and a larger range the light bias has to be quite high leading to a lot of light bleeding. The light bias settings have to be set much higher in the LWRP compared to the built in renderers even with cascades as they use the same bias for all cascades rather than properly increasing the bias for each, so even there it’s a bit of a qualitative drop from the built in renderer. This means either the shadows in the distance show significant shadow acne, or your bias is set for the largest cascade which again produces significant light bleeding.
The most annoying part is the early versions of the LWRP did soft shadows and cascades in the forward passes, even on transparencies, which is pretty much exactly what many of us doing VR want. It was removed because the other techniques are more efficient and they were experiencing an explosion of shader variants as the LWRP is being designed to cover a broad range of platforms and uses, now even more so.
Also, Unity for their part has also acknowledged the current setup isn’t great for VR, and internally there are plans to look into this more.
I see! Very very informative, thank you! I’m actually doing a school VR game right now (due next month), and the shadows, while aren’t that visible in VR, kind of bug me a bit. Would your shader be able to solve the issue?
The concept shown in the shader, yes. The shader itself isn’t doing any shading, just sampling the shadow map and displaying that directly. To use this in a real environment would require some additional work to override the functions in AutoLight.cginc, or use a completely custom lighting model not using Unity’s shading code or surface shaders. I’ve shipped two games using variations on this technique, though I haven’t gotten it working properly for my current project since Unity is making it harder for me as they keep making modifications to the AutoLight.cginc.