gpu instancing optimization

Hi, I’ve never worked with instancing until a few days ago, and I got a few questions that I would like to ask. Currently instead of using Graphics.drawInstanced I’m using Graphics.drawProcedural, so my first question is:

  1. is there a difference between them? can the same be achieved with both?

What I’m doing is attaching a few ComputeBuffers to a material( indices and vertex attributes), and in the vertex shader I access them with this:

v2f vert (uint vertex_id : SV_VertexID, uint instance_id: SV_InstanceID)
{
                v2f o;
                vertex_attributes v = vertices[indices[vertex_id]];
                o.vertex = UnityObjectToClipPos(v.vertex);
               
                return o;
}

But for some reason, I’m not able to render as many instances as I thought I would (having seen all the demos with a ton of asteroids or things like that). As in the example code I wrote above, I tried to simplify everything as much as I could, but having 4000 instances I was getting 44ms (on a radeon HD6870 ).
The problem (as far as I can tell) seems to be the amount of vertices that my model has ( around 1000, so around 3000 triangles… ) because the only way I could find to reduce the ms was reducing the amount of either instances or vertices on the model.
Note that I also tried to reduce the amount of attributes that I’m sending per vertex and per instance (per vertex I only send the vertex position, and per instance the world position, so it’s just 6 floats, about 24bytes), but didn’t seem to matter. So my second question is:

  1. is there something to be done to optimize this aside from reducing the amount of vertices? I am aware that I can implement culling on the gpu as well ( and I plan to do so ), but first I want to try and see how many instances I can get on my screen while keeping good/average framerate.

Any help is appreciated. Thanks!

I’m not sure DrawProcedural support instancing since it doesn’t even support shadows… Why are you using a ComputeBuffer ?

Because I’m using a compute shader to calculate the positions of the objects.
And what do you mean by DrawProcedural doesn’t support instancing? I’m still trying to figure out the difference between DrawInstanced and DrawProcedural, because from all I can tell, they should end up doing the same thing, right?
But from my tests, drawing 1000 objects with DrawProcedural takes 2.5ms, and with DrawInstanced takes 1.8, using the same bare bones shader, so unity must be doing something else to optimize something that I’m not aware of…

I don’t know the differences between theses functions except the fact that DrawMesh/DrawMeshInstanced support shadows and DrawProcedural does not.

Therefore I would not be surprised that there are others differences.

This topic may give you some hints

I never mixed Instancing and a computebuffer since I always use it with a Geometry shaders (single point mesh where I create faces), Instancing is pointless in my case. (Instancing allow to store the mesh data only once GPU-side instead of one per instance).

Alright I see now what’s going on. I used Intel GPA to see the calls that were issued, and it seems that DrawMeshProcedural internally calls directx’s DrawInstanced, while DrawMeshInstanced internally calls directx’s DrawIndexedInstanced and this drastically reduces the vertex shader invocations.
So in conclusion it seems that DrawProcedural is not very good if you are going to draw meshes with large amount of vertices. This is a shame because I cannot make culling with a compute shader and then calling DrawInstanced, but oh well.
And for future reference, using a mesh with 618 vert and 738 tris takes (in my GPU):

  • with DrawProcedural 10.7ms
  • with DrawInstanced 3.8ms
1 Like

That’s confusing !

I’m sorry, I attach 2 images below that hopefully will make it clear, in both scenarios I’m rendering the same ( or almost) amount of instances.
The first one is using Graphics.DrawProcedural


The second one is using Graphics.DrawInstanced

Notice that Graphics.DrawInstanced is calling DrawIndexedInstanced

I understood what I meant, it’s the fact that one DrawProcedural call DrawInstanced and DrawInstanced call another DrawInstanced… :smile:

Well, as a final statement I want to say that I just found out that in unity 5.6 beta they have Graphics.DrawMeshInstancedIndirect, which will allow me to do culling on the gpu! and also it seems that the 1023 limit has been removed :slight_smile:

Docs:

Interesting,

What was this limit of 1023 you were talking about ?

Have you check how many batch you have with your differents setups ?
If I understand it well, you should have only one batch for all your meshes if instanced (unless you exceed the 65535 vertices limit).

Yea it was one batch, the limit I was talking about is only when using DrawMeshInstanced, here is also in the doc:

“Note: You can only draw a maximum of 1023 instances at once.”

You know why this limit was imposed on DrawMeshInstanced ?

You will code your game entierly on GPU, including collisions, effects, etc…?
You will never retrieve GPU computed datas CPU-side because of the bottle-neck I guess ?

I don’t know why they put that limit, but I’m guessing it may be related to the “other tasks” that they perform with DrawMeshInstanced, for example shadow caster pass. Maybe they wanted to keep it sane.

I don’t know if the last 2 questions were directed to me or you were talking about unity’s implementation, but currently I’m working with a lot of particles and moving objects, so I try to offload the cpu as much as I can. In the case of the particles I’m generating them with a compute shader and culling them with another, so I never have to perform a readback to the cpu, it all stays in the gpu. With the processed elements in a compute buffer I can attach them to a material and access them from there on the vertex/fragment shader.

It was just curiosity about your work!

I’m still wondering why Unity don’t include a full GPU Shuriken, at least without physic interaction since it allow way more particles to be shown and moved.

Anyway, have fun.