How to sort 2D objects with same Z position in shader

I was working on 2D sprite rendering DOTS + URP compatible solution. Using regular URP (not 2D) to render sprites. There is some ugly things in my system which I want to try to get rid of.

In my system I can render sprites with different materials/shaders, but to do that I need to collect all, then sort and finally render in resulted order. Because, as I mentioned, there can be more then one shader, then we can have different passes. I’ve decided to get LTW of sprites and shift position’s Z a bit to display sprites in order. For now it is .0001f per position in order, because camera can’t recognize difference below that. I’m not a huge fan of some hidden constant values, but such solution also brings me a need to adjust camera’s clipping planes to be able to fit all rendered sprites + editor and runtime camera have a different Z shift minimum + in scene view with 3D mode enabled rendering result looks weird, like some kind of accordion, which is getting recognizable with a lot of sprites on a scene.

In general i just change LTW’s Z position I passing to shader’s matrices buffer to be able to render passes all at once and remain sprites in proper order.

So is there any good clean ways to do what I want to do?

Shader code:

//TODO: try to simplify #if defined strings
Shader "Universal Render Pipeline/2D/General Sprite Shader"
        _MainTex("_MainTex", 2D) = "white" {}

    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"

        //set here material properties

        Tags {"Queue" = "Transparent" "RenderType" = "TransparentCutout" "RenderPipeline" = "UniversalPipeline" }

        Blend SrcAlpha OneMinusSrcAlpha
        Cull Off
        ZWrite On

            Tags { "LightMode" = "UniversalForward" "Queue" = "Transparent" "RenderType" = "TransparentCutout"}

            #pragma vertex UnlitVertex
            #pragma fragment UnlitFragment

            #pragma target 4.5
            #pragma exclude_renderers gles gles3 glcore
            #pragma multi_compile_instancing
            #pragma instancing_options procedural:setup

            struct Attributes
                float3 positionOS   : POSITION;
                float2 uv            : TEXCOORD0;

            struct Varyings
                float4  positionCS        : SV_POSITION;
                float2    uv                : TEXCOORD0;

                float4  mainTexAtlasST  : ATLASST;



            StructuredBuffer<float4x4>   _transformMatrixBuffer;
            StructuredBuffer<float4>     _mainTexSTOnAtlasBuffer; //ST means Scale + Translation
            StructuredBuffer<float4>     _mainTexSTBuffer;
            StructuredBuffer<int>        _flipBuffer;

            void setup()
                unity_ObjectToWorld = _transformMatrixBuffer[unity_InstanceID];

            float2 TilingAndOffset(float2 UV, float2 Tiling, float2 Offset)
                //Tiling is like Width/Height ratio, like how much texture should be stratched
                //offset is just regular offset from 0,0
                //so when UV.x is 0/1 it is left/right UV coords of renderer rect
                return UV * Tiling + Offset;
            Varyings UnlitVertex(Attributes attributes, uint instanceID : SV_InstanceID)
                Varyings varyings = (Varyings)0;

                //extract all CBuffer data here
                varyings.mainTexAtlasST = _mainTexSTOnAtlasBuffer[instanceID];
                float4 mainTexST = _mainTexSTBuffer[instanceID];
                int flipValue = _flipBuffer[instanceID];
                //fallback if somehow instancing failed
                varyings.mainTexAtlasST = float4(1, 1, 0, 0);
                float4 mainTexST = float4(1, 1, 0, 0);
                int flipValue = 0;
                UNITY_TRANSFER_INSTANCE_ID(attributes, varyings);

                varyings.positionCS = TransformObjectToHClip(attributes.positionOS);

                float2 uv = attributes.uv;

                //flip uv if necessary
                uv.x = flipValue >= 0 ? uv.x : (1.0 - uv.x);

                //tiling and offset uv
                uv = TilingAndOffset(uv, mainTexST.xy,;

                //pass uv to fragment shader
                varyings.uv = uv;

                return varyings;

            float4 UnlitFragment(Varyings varyings) : SV_Target
                //finally frac uv and locate on atlas using tiling and offset
                varyings.uv = TilingAndOffset(frac(varyings.uv), varyings.mainTexAtlasST.xy,;

                float4 texColor = SAMPLE_TEXTURE2D(_MainTex, sampler_MainTex, varyings.uv);
                clip(texColor.w - 0.5);
                return texColor;

    Fallback "Sprites/Default"

Adjusting LTWs job code. Just to show what I actually do, look at PER_INDEX_OFFSET * index

        internal struct FillMatricesArrayJob : IJobParallelFor
            [ReadOnly] public NativeArray<SpriteData> spriteDataArray;
            [ReadOnly] public NativeList<RenderArchetypeForSorting> archetypeLayoutData;
            [WriteOnly][NativeDisableParallelForRestriction] public NativeArray<float4x4> matricesArray;

            private const float PER_INDEX_OFFSET = .0001f; //below this value camera doesn't recognize difference

            public void Execute(int index)
                var spriteData = spriteDataArray[index];
                var renderPosition = spriteData.position - spriteData.scale * spriteData.pivot;
                matricesArray[archetypeLayoutData[spriteData.archetypeIncludedIndex].stride + spriteData.entityInQueryIndex] = float4x4.TRS
                    new float3(renderPosition.x, renderPosition.y, PER_INDEX_OFFSET * index),
                    new float3(spriteData.scale.x, spriteData.scale.y, 1f)

Example of how it looks like in scene view

I’ve found that it is possible to override depth buffer using SV_Depth signature as output in fragment shader. Simply higher the pixel the less depth he writes in depth buffer depth = 1 - 1.0 / ypos. But such approach really affects GPU performance. I’m wondering maybe there is some similar concept of how to test pixels depending on some value but more optimal then through ZTest

The problem with this fragment shader output is it disables early depth rejection, meaning every single sprite is being fully rendered by the GPU, and then it’s throwing away fragments that are occluded. Normally one of the advantages of using alpha testing with ZWrite over alpha blending is you get that early depth rejection so any parts of the surface that are fully occluded (at that point in rendering, which means sorting still being important) are “rejected” and don’t get calculated by the GPU.

There is a way around this though. Modify the z of the vertex shader’s output clip space instead. This is in effect exactly the same as using SV_Depth, but doesn’t disable early depth rejection.

 varyings.positionCS.z = wantedDepth * varyings.positionCS.w;