I am working on a vertex animation shader for foliage and wondering about optimization.
If 100 identicle prefab objects with the same rotation and scale are having their vertices translated like this via shader:
i.vertex.xyz += 1 * i.color.x;
Is the vertex tranform being calculated 100 times, or only once for each unique object and then the the world transform of the other 99 objects is applied to the vertex data? It seems like if there are alot of verts being passed through the shader this could get pretty costly for performance.
That code will be executed for all of the vertices in all the objects no matter how you split it up or batch it. Batching might allow this to happen for multiple objects at once which reduces drawcalls, but it’s still going to do that calculation for every vertex in the batched geometry.
If you want this to happen on all of the instances, you’d be better off directly editing the sharedMesh of one of the instances at run-time (not the mesh - that’ll just instance the mesh and only update that single object’s geometry, sharedMesh is the source mesh that the instances use)… but I’m not sure you’ll get much in the way of savings, it’s a fairly cheap calculation.
Not sure if moving it to the CPU (sharedMesh editing) rather than the GPU (vertex shader) will save you much without trying it.
Thanks alot for the informative answer! I will check out sharedMesh editing, in some scenarios it might be a good thing to try.
In my experiments with sharedMesh, I’ve found that if you do change the vertex positions in the editor outside of play mode, the changes to the mesh become permanent for that model because you’re actually editing the source mesh component itself (unless you re-import the model from the inspector, so unity regenerates the mesh component). So be careful with that, it can get a bit fiddly
What you’re asking about is absolutely nothing to be concerned about; processing vertices is what a GPU DOES. Now, it doesn’t matter, due to fused multiply/add, but why are you mutliplying by 1???
You could move the vertex animation to a script, where it would only be done once per frame on the CPU rather than 100 times per frame on the GPU. However, I wouldn’t be surprised if the result was slower than animating vertices in the shader.
GPUs are very fast at moving multiple vertices around in parallel on their way to the screen. The CPU, on the other hand, has to modify each vertex in sequence and then upload the entire mesh from main memory to VRAM.
Consider, too, that regardless of where you do your vertex animation, the GPU is still (at the very least) multiplying each vertex by the MVP matrix. You aren’t increasing the per-vertex load by very much at all when you add that one instruction to the shader.