I have a handful of simple meshes with a ton of reskins from the same texture atlas (building a 3d tiling system)
I could either make them the same material and have every tile take a different mesh to choose different UVs (even thought they’re the same shape)
-OR-
I could make them all different materials/use property blocks to modulate the texture sampling.
Which one would you expect to yield better performance?
Basically you don’t really gain all that much from atlasing vs using Array Textures, so you don’t have to try too hard to reach good performance. Most of your problem is actually going to be Unity being dog slow on the CPU side, submitting all those draws, culling etc.
You have to test it, obviously. It won’t take you long, maybe a couple of hours?
Fair enough-- I definitely will be testing it as well, but I suppose I’m asking in the hopes of understanding what’s going on behind the scene so I can make better educated guesses (which you’ve generously helped me with).
My understanding reading up a bit on what you said is:
I should be able to make all my static tiles into big static meshes using some mesh baker IFF I use UVs, which would lead to a clear advantage for CPU-time HOWEVER
I could only use GPU instancing if I was rendering identical meshes (i.e., I would have to do this with sampling modulation and leave the UVs all the same). My understanding is that the built in renderer would let me do this while modulating tiling parameters in a property block, but that, because I’m in URP, using property blocks would prevent me from leveraging GPU instancing the same material.
As hippocoder said, you have to make sure that batching works because Unity is very slow at submitting draw calls.
But I don’t think, you have to choose between one or the other. Since you have the same mesh and a texture atlas, you can simply use GPU instancing and adjust the u/v coordinates in the vertex shader. You could use SV_InstanceID or a MaterialPropertyBlock to pass in the instance number. MaterialPropertyBlocks don’t break GPU Instancing.
Unfortunately, I’m not that familiar with URP which has new rules for the SRP batcher. My limited understanding is that the SRP batcher can combine multiple materials into a single draw call as long as the shader (variant) is the same.
So for SRP batcher to work you’d have to
Use a SRP batcher compatible shader
NOT use MaterialPropertyBlocks
Use different materials per instance that all use the same shader and shader keywords
If you want to use GPU instancing with URP, you have to
Use a shader that supports instancing
Do use MaterialPropertyBlocks
Use one material for all instances
Enable instancing in the material
Remove SRPBatcher compatibility
See here for more info:
@hippocoder Feel free to correct me if I said something wrong
What an absurdly helpful answer! Thank you very much!
Parameterizing UVs and performing this in the vertex shader is a very interesting idea-- my plan is to use some off-the-shelf mesh baker, and if I understand correctly those work by stitching together textures, merging meshes, and automatically adjusting UVs in order to get everything into the same material.
I’m not sure how I’d be able to combine meshes if they relied on having separately parameterized materials to get their UVs.
Your post and link helped me finally understand the difference between SRP batching and GPU instancing.
Sorry to revive a somewhat old thread, but could you clarify the ‘DrawInstancedIndirect (GameObjects)’ one? I reckon you mean ‘DrawMeshInstancedIndirect’, except the point of that is that it doesn’t use Gameobjects, right? You set a buffer full of positions/matrices/whatever your shader needs for positioning onto your shader and then have those drawn via gpu directly - very useful for something like a grass rendering system. I imagine it doesn’t get any faster than that (I’m having my character run though literally millions of instances with decent framerate), but you’re saying Hybrid v2 beats that? (I’m trying to find technical details on how Hybrid v2 works, but other than seeing it works with SRP batcher, can’t really find much in-depth…)
I think that statement was a mistake. DrawMeshInstancedIndirect approach often used in pure GPU solutions (with LOD switching, occlusion, frustim culling, etc). Nothing CPU based (even burst compiled) can beat that.