Hi, I’ve never worked with instancing until a few days ago, and I got a few questions that I would like to ask. Currently instead of using Graphics.drawInstanced I’m using Graphics.drawProcedural, so my first question is:
is there a difference between them? can the same be achieved with both?
What I’m doing is attaching a few ComputeBuffers to a material( indices and vertex attributes), and in the vertex shader I access them with this:
But for some reason, I’m not able to render as many instances as I thought I would (having seen all the demos with a ton of asteroids or things like that). As in the example code I wrote above, I tried to simplify everything as much as I could, but having 4000 instances I was getting 44ms (on a radeon HD6870 ).
The problem (as far as I can tell) seems to be the amount of vertices that my model has ( around 1000, so around 3000 triangles… ) because the only way I could find to reduce the ms was reducing the amount of either instances or vertices on the model.
Note that I also tried to reduce the amount of attributes that I’m sending per vertex and per instance (per vertex I only send the vertex position, and per instance the world position, so it’s just 6 floats, about 24bytes), but didn’t seem to matter. So my second question is:
is there something to be done to optimize this aside from reducing the amount of vertices? I am aware that I can implement culling on the gpu as well ( and I plan to do so ), but first I want to try and see how many instances I can get on my screen while keeping good/average framerate.
Because I’m using a compute shader to calculate the positions of the objects.
And what do you mean by DrawProcedural doesn’t support instancing? I’m still trying to figure out the difference between DrawInstanced and DrawProcedural, because from all I can tell, they should end up doing the same thing, right?
But from my tests, drawing 1000 objects with DrawProcedural takes 2.5ms, and with DrawInstanced takes 1.8, using the same bare bones shader, so unity must be doing something else to optimize something that I’m not aware of…
I don’t know the differences between theses functions except the fact that DrawMesh/DrawMeshInstanced support shadows and DrawProcedural does not.
Therefore I would not be surprised that there are others differences.
This topic may give you some hints
I never mixed Instancing and a computebuffer since I always use it with a Geometry shaders (single point mesh where I create faces), Instancing is pointless in my case. (Instancing allow to store the mesh data only once GPU-side instead of one per instance).
Alright I see now what’s going on. I used Intel GPA to see the calls that were issued, and it seems that DrawMeshProcedural internally calls directx’s DrawInstanced, while DrawMeshInstanced internally calls directx’s DrawIndexedInstanced and this drastically reduces the vertex shader invocations.
So in conclusion it seems that DrawProcedural is not very good if you are going to draw meshes with large amount of vertices. This is a shame because I cannot make culling with a compute shader and then calling DrawInstanced, but oh well.
And for future reference, using a mesh with 618 vert and 738 tris takes (in my GPU):
I’m sorry, I attach 2 images below that hopefully will make it clear, in both scenarios I’m rendering the same ( or almost) amount of instances.
The first one is using Graphics.DrawProcedural
Well, as a final statement I want to say that I just found out that in unity 5.6 beta they have Graphics.DrawMeshInstancedIndirect, which will allow me to do culling on the gpu! and also it seems that the 1023 limit has been removed
What was this limit of 1023 you were talking about ?
Have you check how many batch you have with your differents setups ?
If I understand it well, you should have only one batch for all your meshes if instanced (unless you exceed the 65535 vertices limit).
You know why this limit was imposed on DrawMeshInstanced ?
You will code your game entierly on GPU, including collisions, effects, etc…?
You will never retrieve GPU computed datas CPU-side because of the bottle-neck I guess ?
I don’t know why they put that limit, but I’m guessing it may be related to the “other tasks” that they perform with DrawMeshInstanced, for example shadow caster pass. Maybe they wanted to keep it sane.
I don’t know if the last 2 questions were directed to me or you were talking about unity’s implementation, but currently I’m working with a lot of particles and moving objects, so I try to offload the cpu as much as I can. In the case of the particles I’m generating them with a compute shader and culling them with another, so I never have to perform a readback to the cpu, it all stays in the gpu. With the processed elements in a compute buffer I can attach them to a material and access them from there on the vertex/fragment shader.
I’m still wondering why Unity don’t include a full GPU Shuriken, at least without physic interaction since it allow way more particles to be shown and moved.