Help With Custom ECS Rendering (Wihtout Compute Buffers)

I have a project that renders a lot of sprites using a custom SRP (basically a simplified URP which does forward rendering of sprites).

I wanted to test ECS with it as it’s already set up in such a way that most entities are just states packed in memory with systems updating them in batch, and it uses GameObjects with SpriteRenderers as proxies. Before actually doing the logic part, which is relatively simple, I’ve decided to see if I can figure out how to render sprites using ECS and I’ve run into a brick wall - there seems to be very little documentation (if at all) on how to properly interface ECS with external systems.

Some of the places I looked and seem like a no go:

  1. Unity’s HybridRenderer - it relies on compute buffers which is kind of a deal breaker - and also serves as a very poor learning platform since it’s super cryptic and obtuse.
  2. Using Graphics.DrawMeshImmediate - that’s another solution I’ve seen come up quite a lot - but that’s basically short circuiting the whole render pipeline which is not desirable.

I’ve implemented a very naive system that queues draw calls from entities, but it seems to be very inefficient in terms of memory access when reading LocalToWorld data - which makes me suspect I’m approaching this in the wrong way. Another specific implementation issue I’ve run into is type compatibility between float4x4 and Matrix4x4 used by the CommandBuffer API. Conversion is super slow to the point the most efficient method I’ve found is storing transformation on the component level using Matrix4x4 to avoid it.

So to summarize, I would appreciate any help with

  1. Resources on how to properly communicate between ECS and the render pipeline
  2. Any practical pointer on how to implement this test system better (see attached)

And the implementation of the SpriteRenderingSystem for reference:

using System.Runtime.CompilerServices;
using Unity.Entities;
using Unity.Collections;
using Unity.Mathematics;
using UnityEngine;
using UnityEngine.Rendering;

namespace MC.Entities.Sprites
{
    [UpdateInGroup(typeof(PresentationSystemGroup))]
    public class SpriteRenderSystem : ComponentSystem
    {
        private Mesh m_mesh;
        private Material m_material;
       
        private EntityQuery m_entityQuery;
        private CommandBuffer m_cb;
        private Matrix4x4[] m_modelBuffer;

        private const int MAX_BATCH_SIZE = 1023;
       
        protected override void OnCreate()
        {
            CreateRenderData();
            m_modelBuffer = new Matrix4x4[ MAX_BATCH_SIZE ];
           
            m_entityQuery = GetEntityQuery(new EntityQueryDesc()
            {
                All = new ComponentType[]
                {
                    ComponentType.ReadWrite<TRS2D>(),
                }
            });
        }

       
        protected override void OnDestroy()
        {
            if (m_cb != null)
            {
                if (Camera.main != null)
                    Camera.main.RemoveCommandBuffer(CameraEvent.AfterForwardOpaque, m_cb );
               
                m_cb.Release();
            }
           
            if ( m_material != null )
                Object.Destroy( m_material );
           
            if ( m_mesh != null )
                Object.Destroy( m_mesh );
        }


        protected override void OnUpdate()
        {
            CreateCommandBuffer();
            if (m_cb == null)
                return;

            var matrices = m_entityQuery.ToComponentDataArray<TRS2D>(Allocator.Temp);
           
            if (matrices.Length > 0)
            {
                m_cb.Clear();
               
                var offset = 0;
                while (offset < matrices.Length)
                {
                    var step = 0;
                    while (step < MAX_BATCH_SIZE && offset + step < matrices.Length)
                    {
                        // Component type uses float4x4 internally, convert it - this is way worse
                        //ConvertMatrix(matrices[offset+step].matrix, ref m_modelBuffer[step]);
                        var item = matrices[offset + step];
                        m_modelBuffer[step] = item.matrix;
                        step++;
                    }
                   
                    m_cb.DrawMeshInstanced(m_mesh, 0, m_material, 0, m_modelBuffer, step);
                    offset += step;
                }
            }

            matrices.Dispose();
        }


        private void CreateRenderData()
        {
            var shader = Shader.Find("Test/TestInstanced");
            m_material = new Material( shader );
            m_material.enableInstancing = true;
           
            m_mesh = new Mesh();
            m_mesh.SetVertices(new [] {
                new Vector3( -0.5f, -0.5f, 0.0f ),
                new Vector3( -0.5f, 0.5f, 0.0f ),
                new Vector3( 0.5f, 0.5f, 0.0f ),
                new Vector3( 0.5f, -0.5f, 0.0f ),
            });
            m_mesh.SetTriangles(new [] {
                0, 1, 2,
                0, 2, 3
            }, 0);

            m_mesh.RecalculateBounds();
            m_mesh.UploadMeshData(true);
        }
       

        private void CreateCommandBuffer()
        {
            if (m_cb != null)
                return;
           
            var camera = Camera.main;
            if (camera == null)
                return;
           
            m_cb = CommandBufferPool.Get("ECS Sprite Render");
            camera.AddCommandBuffer(CameraEvent.AfterForwardOpaque, m_cb );
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void ConvertMatrix(float4x4 inval, ref Matrix4x4 outval )
        {
            // Manually copy data by ref
            // This is x2 faster than using new Matrix4x4 constructor
            outval.m00 = inval.c0.x;
            outval.m10 = inval.c0.y;
            outval.m20 = inval.c0.z;
            outval.m30 = inval.c0.w;
           
            outval.m01 = inval.c1.x;
            outval.m11 = inval.c1.y;
            outval.m21 = inval.c1.z;
            outval.m31 = inval.c1.w;
           
            outval.m02 = inval.c2.x;
            outval.m12 = inval.c2.y;
            outval.m22 = inval.c2.z;
            outval.m32 = inval.c2.w;
           
            outval.m03 = inval.c3.x;
            outval.m13 = inval.c3.y;
            outval.m23 = inval.c3.z;
            outval.m33 = inval.c3.w;
        }
    }
}

Thanks!

If you can use unsafe, you don’t need to convert LocalToWorld to Matrix4x4. This is really fast. As of now, it works for directly copying LocalToWorldMatrix4x4

protected static unsafe void CopyNativeArrayToManagedArray<T1, T2>(NativeArray<T1> source, T2[] destination, int index, int length) where T1 : unmanaged where T2 : unmanaged
{
    fixed (T2* destinationPtr = destination)
    {
#if ENABLE_UNITY_COLLECTIONS_CHECKS
        Assert.AreEqual(sizeof(T1), sizeof(T2));
#endif
        var destinationSlice = NativeSliceUnsafeUtility.ConvertExistingDataToNativeSlice<T1>(destinationPtr, sizeof(T2), length);
#if ENABLE_UNITY_COLLECTIONS_CHECKS
        NativeSliceUnsafeUtility.SetAtomicSafetyHandle(ref destinationSlice, AtomicSafetyHandle.GetTempUnsafePtrSliceHandle());
#endif
        destinationSlice.CopyFrom(source.Slice(index, length));
    }
}

Amazing! That did the trick!
Rendering 100k sprites (which is more than I’ll probably ever need) with a cpu frame time of <8ms with deep profiling enabled or <6ms without.

Next question is more of an abstract design one - right now I’m just slapping command buffers on my camera. Question is what would be a sensible way to communicate between the system generating the draw commands and the render pipeline.

Is there any way to await an ECS system externally (e.g. in the render pipeline) and just read the buffers there? Maybe I need a system that pushes the buffers to the render pipeline?