I have implemented a very simple point-cloud system (like a particle system where particles don’t move) using a geometry shader that works like the following. I use some noise functions to randomize 50k 3D positions and use those to create a mesh of point-topology in which these 50k vertices are used in the geometry shader as the basis to generate quads on the fly. It works great and Camera.Render time for that costs me around 1.6ms each frame (all 50k points are always visible and rendered).
Now, I tried to implement the same thing using Compute Shaders. So, I do the following. I randomize point position within that CS and store them in a buffer. Next, I adapt the exact same geometry shader as before to retrieve the 50k from that buffer instead of a point-mesh. Therefore, in this application, there is no mesh at all: I just call a Graphics. DrawProcedural(MeshTopology.Points, 50000, 1) within a OnRenderObject function. This also works perfectly, but oddly enough, the performance is significantly worse: always around 2.8ms.
I was really surprised by that. I thought it should actually be much faster than first approach since there is no data passing from/to GPU/CPU every frame. All data stays in the GPU memory all the time, and is retrieved there within the GPU to generate the quads on the fly.
Why would that happen? Why using a point-mesh to fake particle systems in Geometry shaders be faster than doing all it directly within the GPU using ComputeShader + GeometryShader?
6 Years later and I’m still running into this dumb issue. Replying out of some kind of frustrated spite.
Just running a vertex shader, and like, literally all else equal in the shader with both versions reading the geometry from structured buffers, the procedural version is consistently, measurably, at least 35% SLOWER across multiple unity versions. It’s complete hogwash for no reason.
But I literally cannot justify using procedural calls with that kind of an insane drop when it should be FASTER. So I have to sit here and feel like a fucking goober creating dummy meshes and matrix arrays to call the draw mesh function with just to get it to render the right number of vertices and obviously wasting memory and bandwidth doing this but whaaat else am I to do.
Even worse because as an asset developer, even if they fixed it I’d never be able to take advantage of it since I need to target as many versions as I can.
how about now?I test draw procedural on macbookpro 2014 with 2021.3.5f1 still very bad bad bad ! why so bad???
render default is 238 fps, DrawProceduralIndirectNow is 73.7 fps!!! what the hell?? so ,unity can not do GPU driven pipeline!