VFX Graph and batching/instancing

Hi, I am just getting started with VFX Graph, we have a game that has tons of shuriken ParticleSystems and after making a few test we could see how much faster VFX Graph is, which is awesome. However, using the Frame Debugger we have noticed that even if we have the simplest possible VFX Graph, spawning quads with the default shader and texture and doing nothing to them, in a new empty scene, with no scripts or lights or anything… unity is unable to batch the draw calls of the two separate instances of the same VFX Graph.

I have tested everything I could think of, made sure GPU instancing is enabled in the generated material, even tried disabling SRP Batcher (we are using URP), made sure there was nothing specific to our project affecting the results… so now I am unsure, is this the intended behaviour? If I have 10 instances of the same VFX Graph, outputting a simple quad with the default texture and shader, all visible… is that intended to be 10 draw calls? Verified these are separate draw calls from Frame Debugger and Render Doc (in RenderDoc I can see that they are using the same buffers, everything is in line for them to be batched into one draw call… they just arent, its like it is not supported at all). In Frame Debugger the message is either “Unknown reason” for the batch, or “different material”. Here is a screenshot of my full VFX Graph, which is as simple as possible:

Would appreciate any help or hints of what to look after, but at this point Im starting to consider if it might be a bug of our version (2022.3.50) or idk… some obscure project setting that might be breaking the batching.

I think separate instances of a VFX graph aren’t batched together. I followed this to make a single instance of my VFX graph render multiple instances.

Hi!

It may seem strange, but it is actually expected. Don’t worry, it should not have a big performance impact.

When using instancing, like you said, we try to batch compute dispatches and drawcalls together. But sometimes it is not possible, or it is not the best option.

In this particular case, it seems that the drawcalls are split because indirect draw is enabled, which is the default for transparent particle outputs.

With indirect draw, each instance may have a different number of particles being rendered. If we drew all instances together, it would not be trivial to identify which particle belongs to which system. A solution for this would be building a prefix sum and use binary search to identify the instance. Instead of this, we decided to render separate drawcalls.

As I said before, that does not necessarily mean a big CPU cost. Regular drawcalls require some processing to set the shader, uniforms, etc. But it is not the case here. Shader is set up only once for each set of instances, with just the minimum information being uploaded between drawcalls, instead of once per drawcall.

A good way to compare with non-instanced rendering is to look for the “ApplyShader” marker in the profiler.

Be aware that ApplyShader may be called more than once per batch. For instance, if you have other transparent objects in between your instances, it will be split in several groups.

Another reason to split batches can be using different exposed textures for each instance.

Anyway, if your compute shaders are batched, instancing should be working fine :slight_smile:

Hope that helps!