Most efficient way to render particle sprites

Hello!

I’ve been using Unity for a while, but currently in a project where I’ve been encouraged to push technical limitations which is always fun as a developer! I know a bit about the Unity pipeline, but not so much about the finer details, especially in regards to batching draw calls, etc… Honestly that’s an area where my knowledge of engines in general drops off a bit so forgive my ignorance.

So currently I’m working on a massive GPU based particle system to simulate some natural phenomenon. (Built a GPU based numerical ODE solver which I’m pretty excited about!) Anyway, I’m holding position and other per-particle data in a structured computebuffer that I pass as a structuredbuffer to a shader. So with this, I’m looking at different approaches for rendering the particles. They will most likely textured quads, but potentially with a few extra triangles if I can manage. Ideally I’d like a single system to render up to 1,000,000 particles at VR framerates (at least 60, 90 much pref’d). I don’t think it’s unreasonable with new hardware. I’ve been developing on an MSI Laptop with a GTX1060 inside

The approaches I’ve been trying:

  1. Geometry shader & Graphics.DrawProcedural (obviously not the fastest, but convenient for testing)

  2. Batched meshes with many quads per mesh.
    I’ve been following the approach here (note, just the rendering approach)
    GitHub - i-saint/MassParticle
    It’s interesting though because I think this system doesn’t actually take advantage of dynamic batching due to too many verts per mesh and material instancing. Reducing the maximum verts per mesh and using shared materials, I was able to get dynamic batching working and gain 10-20fps on this system.

  3. Regular mesh instancing.
    Graphics.drawMeshInstanced()
    This seems to break down pretty quick and frame rates plummet.

  4. Trying to create some mix of batching and instancing.
    I haven’t quite gotten this to work, but it would be a mesh with many quads per mesh (staying within the limit to use dynamic batching.) and then instancing that mesh. Early tests seem like this isn’t the answer.

I think the VERY fastest approach would be to use Graphics.DrawProceduralIndirect with 4x as many point primitives as I have particles. The problem with this approach is having to do lighting and shadows completely manually (correct me if I’m wrong.)

General notes or ideas are very welcome. I have no hardware limitations. Will most likely be GTX1080s for the installation. Just trying to see how far I can push it and what the best approach is. Thanks for your help!

I would, as you imply, avoid a GS solution. However, if you have a compute buffer full of particle data, you can very easily replace the GS with a vertex shader that loads the compute buffer data. Just send 4x the verts to DrawProcedural, and use SV_VertexID (or SV_InstanceID) to decode which corner of the quad you are on.

I wouldn’t bother with any of these :slight_smile: The modification I suggest to option 1 will give you the full rendering potential of the GPU without any of the general purpose rendering stuff in Unity getting in the way.

I think you can still do a surface shader with the approach I suggest in 1, although maybe a bunch of lighting shader constants don’t get set this way… but regarding shadows, yes, definitely you’re on your own there. Hopefully we provide an option to insert the shadow render call into a command buffer or something (I’ve never tried it).

Yep, that’s what I was suggesting at the very end of my post. Definitely thinking this would be the fastest.

That’s what I was afraid of. The system I’m creating definitely looks really amazing when all the pieces cast and receive shadows to each other. I’ll have to see if this is something I feel like I can implement.

This is very interesting! So this would potentially create the shadow map for me, and I could just do a multiplication in the fragment shader? I’ll definitely give it a shot.

Oops, yes, I somehow missed that, sorry :smile:

It shouldn’t be too hard. For shadow casting, It’s just a shadow pass in your shader that does most of the same vertex shader work, but the pixel shader only reads any textures used for alpha testing (clip).

Calling it at the right time in the scene drawing may be trickier, but that’s hopefully where the command buffer stuff comes in… I think you want to use something like https://docs.unity3d.com/ScriptReference/Rendering.ShadowMapPass.html

I think this is what you’ll need for shadow receiving:
https://forum.unity3d.com/threads/adding-shadows-to-custom-shader-vert-frag.108612/#post-724147

Hey just wanted to share my results. 1,000,000 quads + Compute dynamical system. 30fps in the editor not bad at all!
It’s only chunky because of OBS. Need a better way to render…

Thanks for the tips.

1 Like

Nice job!