We have a custom alpha cutout shader for trees written using Better Shaders and I noticed it was pretty slow (on one specific current gen console at least, I haven’t been testing on other devices) so I was doing some investigating. It seems like the moment I add clip() anywhere in the shader performance drops pretty substantially when compared to a similar shader graph shader or the HDRP Lit shader set to cutout, even if all the pixels are actually drawn. I had a look at the code generated by shader graph and saw they were using a macro GENERIC_ALPHA_TEST. I tried using that instead of clipping manually but it doesn’t seem to change the performance. I saw that shader graph also used to be slower than the HDRP Lit shader so I presume there is a better way to do this but I have no idea what (Unity Issue Tracker - A most simple Shader Graph with Alpha clipping is performing worse when compared to the HDRP/Lit Shader).
Does anyone have any idea how shader graph and HDRP Lit are doing this so much faster? It’s the difference between 100% GPU utilisation at ~50fps using a basic shader with clip() and ~75% utilisation at 60fps using HDRP Lit or a ShaderGraph shader set to cutout, so not a small difference.
HDRP Lit is calling the same DoAlphaTest() function GENERIC_ALPHA_TEST calls. The difference appears to be it also sets an extra define when alpha test is enabled.
SHADERPASS_GBUFFER_BYPASS_ALPHA_TEST
That DoAlphaTest() function also has a note about this.
Thanks! That sent me in the right direction. Took me a while to understand it, but I think I get it now. They force a depth prepass and do the clip there. There is no need to do the clip in the following passes, they can just use ZTest equal instead and that seems to yield much better performance. I’ll have to go dig into the Better Shaders passes to understand how that could be applied there