Why would you not have GPU instancing on?

I am trying to optimise my game and came across GPU instancing which seems perfect for my trees in game. I am going to turn GPU instancing on for all of them but why would such a feature not be on by default and never be turned off? I haven't found the downside to it.

It has been discussed before. It has overhead, it might make performance worse, it depends, profile your game.

Have you a link to that?

I know Static and Dynamic batching both have pros and cons, but I'm yet to see cons for GPU Instances.


Instancing has some additional fixed costs associated with it, both on the CPU and GPU. For a small number of objects it's potentially more expensive to use instancing vs even just drawing each individually.

1 Like

can be more specific? I want know more about it.

To do instancing the CPU has to gather all of the renderers using the same material, mesh, and shader variant, then pack up all of the transforms and other per instance data into one or more arrays of data and upload that to the GPU all at once. This is similar work that happens for dynamic instancing, though instancing has to do the additional work of going through the optional material property blocks assigned to the renderers on top of just the base materials. Those arrays are also either needing to be recreated each frame if the number of objects change, or you're reusing an existing larger array which you're reuploading even though only part of the data is changing. Both have costs to use, but you do want the number of objects to change because you want to cull the objects not in view so you're not paying the cost on the GPU of calculating vertices for meshes you won't see.

On the GPU, using instanced data adds some indirect data access, which adds some cost. When using instancing all of the data is in those arrays, but the shader gets told the instance ID and then has to go to each of those arrays and get the data from them, where as normal rendering, or static/dynamic batching the data directly handed to the shader all ready to be used immediately. How much cost depends on the GPU, which mobile GPUs taking a bigger hit than desktop GPUs.

One thing most people kind of get wrong about both dynamic batching and instancing is neither are really about making the GPU render faster, it's about GPU utilization. Technically speaking, the GPU is doing roughly the same amount of "work" in roughly the same amount of time regardless of if objects are rendered individually vs batched or instanced (and actually more work for instancing). What both do is remove down time that the GPU is sitting stalled waiting for additional commands from the CPU. If you tell a modern GPU, even a mobile one, to render a single sprite, it'll be finished with that in nanoseconds and sit stalled waiting for the next command from the CPU. So both are trying to take a little more time on the CPU to get a "bigger" draw ready with the expectation that getting one "bigger" draw will take less time on the CPU than the total time it takes to do lots of individual draws, and that the GPU will stay busy long enough that the CPU can get the next draw command sent before the previous finishes so the GPU isn't waiting around.

But because getting those "bigger" draws takes a bit more time on the CPU, if you're only doing it with a smaller number of meshes you may end up increasing how long the CPU takes vs individual draws.


Do you know of any guidelines as to the order of magnitude, or is it so specific to individual meshes, materials and/or hardware circumstances that you just have to try it out for everything and see what works best? Of course thousands of blades of grass are a clear cut case, but in a projectile based shooter where there can be, lets say up to ~100 bullets of a kind in the scene at any point would it make up for the overhead, especially when the bullet count is below the maximum.