How can i discard all Black pixels or make them transparent without loosing performance?

So, I’ve managed to put together a Shader file for some snowfall effect (In all honesty I found most of the code online) However, the original code made the background black. With some research I’ve managed to discard the black pixels essentially making the world behind the snow visible.

The problem I am having is that the Shader is cutting my framerates in half. From around 1200 to 400-500. I think that’s too many frames to loose for this and I believe it can be done more efficiently.

I think the culprit is the IF statement and the loop but I’m hoping somebody can give me some knowledge on this so I can try to achieve better performance out of this.

I’m fairly new to Shader writing but I am enjoying the challenge.

Shader "Custom/ParticleStorm"
{
    Properties
    {
        LAYERS("Layers",Int) = 100
        DEPTH("Depth",Range(0.0,1.0)) = 0.2
        WIDTH("Width",Range(0.0,2.0)) = 0.9
        SPEED("Speed",Range(0.0,2.0)) = 0.9                 
    }
    Subshader
    {
         Tags{ "RenderType"="transparent" "Queue"="transparent"}
         Blend SrcAlpha OneMinusSrcAlpha
       
        Pass
        {
            CGPROGRAM
            #pragma vertex vertex_shader
            #pragma fragment pixel_shader
            #pragma target 3.0

            #include "UnityCG.cginc"

            int LAYERS ;
            float DEPTH,WIDTH,SPEED ;
            static const float3x3 p = float3x3(13.323122,23.5112,21.71123,21.1212,28.7312,11.9312,21.8112,14.7212,61.3934);

            struct custom_type
            {
                float4 vertex : SV_POSITION;
                float2 uv : TEXCOORD0;
            };

            custom_type vertex_shader (float4 vertex : POSITION, float2 uv : TEXCOORD0)
            {
                custom_type vs;
                vs.vertex = UnityObjectToClipPos (vertex);
                vs.uv = uv;
                return vs;
            }

            float4 pixel_shader (custom_type ps) : COLOR
            {
                float2 uv = ps.uv.xy;
                float3 acc = float3(0,0,0);

                float dof = 5.*sin(_Time.g*.1);
                for (int i=0;i<LAYERS;i++)
                {
                    float f = float(i);
                    float2 q = uv*(1.+f*DEPTH);
                    q += float2(q.y*(WIDTH*fmod(f*7.238917,1.)-WIDTH*.5),SPEED*_Time.g/(1.+f*DEPTH*.03));
                    float3 n = float3(floor(q),31.189+f);
                    float3 m = floor(n)*.00001 + frac(n);
                    float3 mp = (31415.9+m)/frac(mul(m,p));
                    float3 r = frac(mp);
                    float2 s = abs(fmod(q,1.)-0.5+0.9*r.xy-0.45);
                    s += 0.01*abs(2.*frac(10.*q.yx)-1.0);
                    float d = .6*max(s.x-s.y,s.x+s.y)+max(s.x,s.y)-.01;
                    float edge = .005+.05*min(.5*abs(f-5.-dof),1.);
                    float t = smoothstep(edge,-edge,d)*(r.x/(1.+.02*f*DEPTH));
                    acc += float3(t,t,t);
                    //acc += float3(t,t,t);
                }
                //return float4(float3(acc),1);

                half3 transparent_diff = acc - float3(0,0,0);
                half transparent_diff_squared = dot(transparent_diff, transparent_diff);

                if(transparent_diff_squared < float(0.05))
                discard;

                return float4(1,1,1,1);
            }
            ENDCG
        }
    }
}

Are these performance numbers compared with nothing else in the scene, just this shader, testing between having and not having the condition?

That’s a very simple conditional that shouldn’t really have much impact as it’s not branching into another long path of computations, discarding can have a little overhead on some platforms though. 400-500 is still really good if that’s on your target platform, especially considering all the math you’re doing in that shader and in a loop too.

But, instead of discarding, what you can do is set the .a value of your returned color to 0 if that condition is true, instead of returning 1,1,1,1, you’d return 1,1,1,0

Good question. Well, I’ve got a scene with a 3D character with a walking animation and a plane with a custom Shader on it which creates tracks in the snow as the character walks around. not much going on at the moment. That’s literally it. With that it runs at around 1000-1200 fps. As soon as the custom snowfall shader is attached to a quad the fps drops to around 300-500.

I’m loosing over 500-700 frames.

I’ve done some testing and it seems that this piece of code is the culprit

half3 transparent_diff = acc - float3(0,0,0);
                half transparent_diff_squared = dot(transparent_diff, transparent_diff);
                if(transparent_diff_squared < float(0.05))
                discard;
                return float4(1,1,1,1);

Without it, it runs around 900-1000 fps. But it gives me a black background which is not something I want.

I’m wondering if a particle system is a better option for a snowfall effect or if there is a much simpler way to achieve this via custom Shader.

Any advice?

Can you test the shader without anything else in the scene though? Test the performance with just the quad, with the discard enabled and not enabled. The way depth culling happens, given that you haven’t disabled ZWriting could be coming into play here.

Also as mentioned, you need to make your returned alpha value 0 so that your BlendMode can actually do something, so that black pixels will actually be transparent. Test that instead of the discard.

Particle system could be better, but it’s hard to say without seeing the scale/quality of the effect you’re going for.

Okay so I’m super confused now :-/

ZWrite Off - did nothing. still 200 fps
return Alpha to 0 - makes the snow flakes dissapear and the fps is still around 200 fps

tried the following

float x = float(0.05);

                if(acc.x < x && acc.y < x && acc.z < x)
                return float4(0,0,0,0);

                return float4(1,1,1,1);

and still around 200 fps

with nothing but the camera and the quad - around 220 fps

with anything else and the quad - around 200 fps

the interesting things that I have noticed is that when the quad with the shader is active the CPU Main and Render thread figures jump from 1 and 0.8 to 2.5 and 4.6 respectively. Honestly, I’m not too clued up on threads and such but maybe someone can explain if this is of any concern.

What I am aiming for is to create a snowfall that covers the camera and gives the impression of depth and a blizzard/storm sort of an environment. I thought a customer shader would be good as I can always re-use it and when applied to a material, I can control the settings quite easily etc.

I’m curious to know what is the most common method and the most resource efficient method of achieving such a thing.

In the mean-time, I will probably try this shader in a new unity project just to see if that makes a difference.