So as far as I’m aware there are two main methods to feasibly achieve a high number of ‘skinned’ mesh animation instances in Unity.
Baked Meshes
This method involves creating a unique mesh for every frame, of every animation, of every model and every LOD in the game. The you choose the correct frame and use DrawMesh to render it. Even though as a technique it sounds awful from what I’ve seen this is perfectly capable of achieving a high number of rendered models on a good system.
Drawbacks
- Potentially heavy memory requirements depending upon the framerate of your animations, number of models, number of animations and number of LODs.
- The animation is basically like a ‘flip book’, as you are simply presenting a static mesh each frame.Granted we’ve had that in 2D games with sprites for decades, but I’m skeptical that it will feel good when mixed with other none baked animations, physics driven objects, if the framerate is not a multiple of the baked animation rate.
- Culling and LOD (generally ) still has to be done on the CPU in order to use DrawMesh.
Positives
- Interacts as expected with all the rest of the Unity rendering systems with no additional effort.
- Relatively straightforward to implement.
- Doesn’t tax the GPU.
- Works for older GPU systems as long as there is enough memory.
In fact it is entirely possible that the downsides could be address with some clever tricks and pushing more work to the GPU. For example providing the next frame mesh data with the current mesh, so you could interpolate between the two something akin to the old MD2 Quake format in the vertex shader, should be feasible.
Skinned Instanced Meshes
Generally a more complex method as it requires implementing you’re own skinning on the GPU as well as supporting instancing and dealing with a bunch of other stuff. This requires extracting all the bone animation data and passing it to the GPU, which can then be used with custom shaders along with instance ID to render the mesh and animate it with Bone Matrix palette skining. There is a good example/source for this on the nvidia website and in GPU gems 3.
Thankfully with ComputeBuffers you no longer have to pass the data via textures like they did in 2007. Though everything I’ve read implies that using textures and bilinear interpolation can automatic provide inter-frame interpolation. I’m not sure about this as my understanding is you cannot simply lerp two matrices. The positional part should be fine, but weird things are going to happen to the rotations. I’m going to have to give it a try sometime though and see as one of the biggest drawbacks of this method is that it will make your GPU cry, due to the amount of effort placed on the vertex shader.
Drawbacks
- It will use every ounce of your GPU power. The bone matrix palette skinning and all the look ups required is a constant overhead and its per vertex! This is what makes LOD so important, as every vertex saved means saving considerable processing time in the vertex shader.
- Doesn’t always play nice with Unity rendering systems due to instancing. I think there might be a number of gotcha’s coming up with this, such as supporting lightprobes, forward rendering not working with multiple lights. I know there is a bug in the demo currently for shadows where they no longer respect the positions of the drawn instances. I think this is shader related, as i’m sure it was working fine before adding frustum culling method or LOD.
- Shadows are an issue, as they require rendering the instance again or with cascades maybe several times. Since as stated the bottleneck of this technique is the vertex skinning, that will become amplified. Essentially every time you render the instance again it will halve your framerate. This could be alleviated with ‘streamout’ where you can store the shader resultant geometry on the GPU, which AFIAK is what Unity uses for its own GPU skinning. However due to the shear number of instances being rendered this would be prohibitive in this case and worse than Baked Meshes in terms of memory requirements.
- Forward Rendering has the same issue as shadows, every light requires an additional add-pass, which just becomes prohibitively expensive as you are running all the vertex calculations again. However so far my experience has been forward rendering with multiple lights is just broken and even if it worked according to the docs the add-pass instances would be rendered normally instead of instanced. It might just be feasible though if we can build on the custom shader provided by Vavle for its VR LabRenderer, which I believe supports quite a few lights and shadows in forward rendering without using the add-pass technique.
Positives
- Greatly reduce amount of data to store on the GPUdue to frame interpolation. In the above demo the animation is stored at just 10 fps, however that can easily be increased to say 30 fps and still only take a fraction of the storage that baked meshes would.
- Can easily off-load frustum culling and LOD selection to the GPU which can save a good chunk of cpu time. In addition I want to add per instance depth sorting to minimize overdraw ( not sure how much of a win in deferred that will be ). Taking it further you could even drive the entire crowd on the GPU using simulation.
- To a degree its easily scalable to your hardware, simple to adjust number of instances, use lower vertex count models, dynamically change LOD settings etc.
- Its even possible to drive the instances via Mecanim animator, though not possible to have an animator per instance, not even close and performance will suffer.
Driven by Mecanim
Its possible to drive the skinned instance method via Mecanim, but it cannot have each instance using a individual animator/animation.
Mecanim is pretty amazing but it has a reasonably large overhead, an overhead that is considerably worse when not being able to use the ‘optimize gameObjects’ option. That option cannot be used as currently the only way to get the animating bone data is to fetch the transforms of each bone. If only Animator component could supply an array of Matrix4x4 for each bone instead of driving transforms you could probably double the number of Mecanim animations driving instances. However this would still end up as a fraction of the potential instances that could be rendered.
Its all rather complex
Once you have your chosen method up and running, things are still more complex to deal with than normal as the main point of both systems is to completely remove/detach the rendering of instances from Unity’s gameObject model.
Its the gameObject model that can really hammer performance once you scale up to 10,000 or more objects. Modifying the transforms, updating bone transforms etc, it all adds up as an overhead. Both the suggested systems avoid gameObjects per instance and instead should work with arrays of position/rotation data ( matrix4x4 ), but this means its somewhat harder to create a generic system that could easily be plugged into any project and would require the developer to drive their game more via code.
So Many Possibilities
Currently i’m undecided as to which method is best or indeed if there even is a best. I suspect each has its place depending upon project requirements. Though both have some serious drawbacks I believe they can be addressed with some lateral thinking and effort.
Beyond that there is then the consideration of variation. Its all very well rendering 10,000 instances of the same model, but even if the animation of each is independent, they all look the same. Colour tinting on its own isn’t enough, so considerable effort will have to be employed to find the most optimal methods of creating variations using the same input data. There are a number of avenues to pursue for this, from a simple instancing of parts ( e.g. different heads, helmets, weapons, clothing ) to more advance concepts such as Valves Left4Dead Gradient Mapping.