good day, I have a big problem - I can’t figure out how to place many objects without losing performance, I tried to just create, I tried to combine grids, and of course, draw, the results are bad
here is an example with Graphics.DrawMeshInstancedProcedural()
I can’t figure out how to output a lot of objects (grass, for example) without the performance dropping too much…
What is the best way???
How can I achieve this?
I’ve already tried a lot of things and the results are very bad when outputting more than 2-3k objects, but in games I see that there are a lot of objects and other actions are happening…
Upd. conducted a larger scale test, increased the area and quantity
I changed the approach a little, replaced Graphics.DrawMeshInstancedProcedural with Graphics.DrawMeshInstancedIndirect, there are no saved packets now, and there are about 500 packets themselves, despite the fact that I output 100k objects, but still - even if it’s 100k or 2k objects, it’s still ±30 fps, what else can be changed in this situation?
at least something, I don’t understand, the profiler has a high load on the CPU, the character doesn’t count because without grass the fps is under 150 and the load on the processor is small…
We can’t really tell what’s causing the issue without seeing more of your code. Make sure all data is handled on the GPU and don’t create or dispose buffers every frame. I’ve also been through this and made a grass instancer with fairly good performance. You can look at the code and maybe it can help you.
{
// fill matrices of objects
List<Matrix4x4> _matricesList = new List<Matrix4x4>();
while(){
...
_matricesList.Add(Matrix4x4.TRS(_pos, _rot, _scale));
...
}
// put matrices into the shader buffer
matrixBuffer = new ComputeBuffer(_matricesList.Count, sizeof(float) * 16);
matrixBuffer.SetData(_matricesList.ToArray());
material.SetBuffer("_Matrices", matrixBuffer[_i]);
// fill args (so as not to constantly transmit the entire buffer)
uint[] args = new uint[5]{ 0, 0, 0, 0, 0 };
args[0] = (uint)mesh.GetIndexCount(0);
args[1] = (uint)_matricesList.Count;
args[2] = (uint)mesh.GetIndexStart(0);
args[3] = (uint)mesh.GetBaseVertex(0);
matrixArgsBuffer = new ComputeBuffer(1, args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
matrixArgsBuffer.SetData(args);
}
private void Update(){{
Graphics.DrawMeshInstancedIndirect(mesh, 0, material, renderBounds, matrixArgsBuffer);
}
It seems to me that the problem is that I call every frame, but I haven’t found a better method, in terms of doing it so as not to send it every frame…
Calling DrawMeshInstancedIndirect every frame is fine and required to render the objects. Also note that GPU instancing is fast but can also quickly reache its limits if you don’t do any culling.
And if you’re using a custom shader for the material, it could also may be optimized.
Do you also read back some data from the GPU? If so, that’s a big bottleneck for performance.
One thing I’ve noticed is that you seem to create multiple buffers in a loop. Depending on your goals it may be better to have one large buffer.
I create one buffer with matrices that I pass to the buffer once (in this example), and then I create a small buffer with arguments that indicate what and how much to run.
(I’ll be honest, GPT advised this)
Do you also read back some data from the GPU? If so, that’s a big bottleneck for performance.
where am I doing this?
maybe I’m missing something…
and regarding your examples, as I understand it, I do the same, BUT, you have an unlit shader and a simplified one, hence such high performance, and I launched with a full-fledged HDRP Lit shader, now I switched to URP and cut the standard Lit shader a little and the performance is much higher even taking into account that I just combined the grids.
in general, I understood the following, firstly - the biggest performance problem is precisely because of the shader’s weight, the second thing I understood is that I can’t do without an update, as I understand it, this is the best way.
and by the way - are all objects ultimately rendered through these methods?
and regarding Graphics.DrawMesh, I understood that GameObjects that have a MeshRendered using Graphics.DrawMesh at a low level to display the mesh, if I’m not mistaken, that’s how the terrain works
I remade the urp shader for myself so that everything was there and the grass moved, that is, the grass has normal and light and the fog works, then I used the script that was recommended to me (Occlusion Culling + High performance), of course I rewrote it for myself since my grass works without terrain.
In general, the result is as follows - with 2,500,000 blades of grass (just for testing) the fps stays at 60-70, this is very cool, especially considering that the shader supports everything that the standard lit.
But I am still tormented by one question - with 2,500,000 blades of grass the fps is 60-70, with 10,000 blades of grass the same, I don’t understand, because it is logical that the fewer copies the greater the performance, isn’t it? (I don’t understand)
and one more question, maybe not quite on topic - I don’t know how to implement it, if I have many unique grids (but with the same material), then is it possible to somehow output them in the same way? I just don’t know yet how to apply different ones, because this method only implies the same grid, and I have thousands of them…
I’ve also noticed that there is not much performance difference between using very many grass blades and very few. I don’t exactly know why but this has something to do with how GPUs work.
And about the thing with grids: I already implemented that but didn’t publish it yet because it’s way more complicated. You can see a video of it here (at 0:50 is little behind the scenes section).
In a nutshell: You can create another grid layer of big chunks on your terrain, just like the small chunks. These big chunks then basically run the script that you’re currently using. They need their own material, compute shader instance and buffers.
You’ll also need to determine the chunks around the camera to know what chunks to render.
well if the same fps with different amounts of grass is normal and nothing can be done about it, then in terms of the method of displaying a lot of grass this is the maximum method.
and about how to load large chunks (I consider a large chunk to be a buffer with small chunks) - I think it is better to write data about grass (their transformations, or just positions) to files and load it into buffers when needed (when the camera is close to large chunks), this is not a difficult task.
it remains to understand how to display many objects with different mesh (each mesh is unique), at least what is better to use for this?
Storing the data in files does not make any sense. At least for my approach, because it can generate grass so fast that you do not notice it in between frames. Reading from files could have a bigger impact on that.
Also GPU instancing does not work for different meshes. It can only be used for the same meshes.
But a little workaround I’am using for LODs (literally different meshes) is changing the vertex buffer:
Storing the vertex data of the LODs in different buffers:
_lod0VerticesBuffer = new ComputeBuffer(m0.vertices.Length, 3 * sizeof(float));
_lod0VerticesBuffer.SetData(m0.vertices);
Shader.SetGlobalBuffer("lod0Vertices", _lod0VerticesBuffer);
The compute shader then assigns a LOD level of 0, 1 or 2 to the chunk (or instance).
In the vertex shader you can then use these buffers like this:
struct VertexInput
{
float4 vertex : POSITION;
uint id : SV_VertexID;
};
//...
StructuredBuffer<float3> agr_lod0Vertices;
//...
VertexOutput vert (VertexInput input, uint instanceID : SV_InstanceID)
{
if (instanceData.lodIndex == 0) vertex = lod0Vertices[input.id];
else if (instanceData.lodIndex == 1) vertex = lod1Vertices[input.id];
else if (instanceData.lodIndex == 2) vertex = lod2Vertices[input.id];
//...
I have a slightly different situation, I have something like a 3D terrain, the grid is generated in separate chunks 10x10x10, but these are walls, ceiling and floor, you also need vegetation, underground rooms can be large and you need a lot of grass there, the grass is located in static places, although sometimes you can dig or destroy, which removes the grass, so it is better to store the positions of this grass in a file and load it when the terrain chunk becomes visible, and vice versa, unload it from the buffer if the character has gone far from this chunk, something like that…
UDP. - GPT chat advised to try the following scheme - to transfer vertices with their positions to the computational shader at once (C# > Cumpute Buffer → Shader), of course I understand that the chat sometimes carries nonsense, but it seems to me that there is something in this idea, I mean displaying many unique grids, they have the same material, the only difference is in the vertices, and in order not to bother with UV you can make a triplanar…
For destroying the grass storing something like the destroyed position in a file could work. Or maybe write that into a texture and just modify the vertices so that the grass at the location isn’t visible any more. Then you don’t need to modify the buffer and makes things a bit easier.
Also keep in mind that your grids will need their own material instances (but the same shader). That way each grid can have different meshes.