Important: I have little understanding of GPU, and have only done procedural mesh generation in Unity.
I think we all agree that
GameObject.transform.position = newPosition;
is much faster than
Vector3[ ] vertices = GameObject.GetComponent().mesh.vertices;
for (int i; i < vertices.Length; i++) { vertices = newPosition; }
GameObject.GetComponent().mesh.vertices = vertices;
To improve performance, I constantly “cache” procedurally generated meshes in GameObjects, then manipulate those GameObjects using .transform.position, .rotation and .localScale. The problem? I don’t fully understand why this works.
3 questions (+ bonus) that will greatly improve my understanding of what’s going on underneath the hood:
Does a hierarchy of parent & children vertices exist at the GPU level?
i.e. Using GameObject.transform.position just moves the parent vertex
(so all relative child vertices automatically move with it)
If there is no hierarchy, and on the GPU (in BOTH cases) each vertex need to be shifted to newPosition, why is GameObject.transform.position so much faster? Does transform.position, transform.rotation, transform.localScale send special commands to the GPU telling it to modify the mesh (so all .vertices ARE modified, but they never leave the GPU so it’s mega fast)?
Is there a performance difference (assume no colliders) between rendering a moving GameObject and rendering a still GameObject? From what I’ve seen, there’s no difference which kind of boggles my mind - especially for large meshes.
Bonus Tangent, but likely related: Why doesn’t a 1x1 Texture take longer to render than a 500x500 Texture when texturing a LARGE cube? i.e. Blitting a 1x1 texture 250,000 times feels like it should be much slower than Blitting a 500 x 500 texture once.
Any insight is truly appreciated.
Feel free to dumb it down
Afaik, the position/rotation/scale get combined into a 4x4 matrix, which all vertices are multiplied against, moving them to their proper position. Performance is probably virtually the same whatever the matrix contains, since you have to multiply anyway, making moving / rotating / scaling sort-of free. (I could be horribly wrong here though.)
For question 4, I’m not sure how it’s done, but it’d be something like frac(originalUV * scale), scaling the uv-map from (0, 1) range to (0, 500) and then taking only the fraction so it repeats 500 times between (0, 1). And it should be quicker since the texture would be cached better.
Thanks very much. With that more-or-less settled, here’s what I’m wondering…
Can I use 1 GameObject with 1 mesh, that has “submeshes” EACH with its own 4x4 matrix?
This would allow me to have 1 DrawCall while being able to position, rotate and scale groups of vertexes QUICKLY (since I don’t need to pull them into code). I know Unity has submeshes, is this what they’re used for?
Submeshes aren’t magic, if they’re using the same material and are capable of being batched then they will be, just like separate GameObjects can. I don’t believe they have separate matrices though, as it’s more of a multiple material convenience than anything.
Cache. The GPU cache holds a little chunk of the texture to use it. If the next pixel you use is already in the cache it doesn’t have to fetch a new chunk of the texture before reading the pixel value. For a 1x1 the whole texture will fit in the cache at once, so it’ll never need to fetch a new chunk.
Also keep in mind that blitting is at least as much about the number of pixels written as it is about the number of pixels read. Plus there’s how they’re written, because it’s usually not just blitting these days.
Thanks a bunch everyone, this is really helping me understand things.
Last and related question:
If I create a rigged animated character and import that into Unity, how exactly do they animate?
We’ve established that a GameObject gets 1 4x4 matrix…
so what’s happening underneath the hood when an arms & leg move?
Do they rebuild all the vertexes (deadly slow)?
Does every limb get a separate child GameObject
(resulting in tons of draw calls unless you have an extremely small amount of vertexes)?
For a skinned mesh the bones are represented using empty GameObjects, and then each vertex in your main mesh is associated with 0-4 bones using a weight, and then these vertices are transformed (on the GPU I imagine) to follow the bones it’s associated with. This separate GO per limb technique you mention is also used in some cases (especially for vehicles) when you don’t need any kind of bending or twisting in the mesh.
Thanks again for all the help @Darkcoder_1 , truly appreciated. “and then these vertices are transformed (on the GPU I imagine) to follow the bones it’s associated with”
This speaks to the core mystery I’m trying to understand. I’m doing lots of procedural mesh generation and would LOVE to tap into GPU vertex transformations if they exist. Specifically… I want ONE GameObject with ONE giant mesh where I apply DIFFERENT GPU transforms (4x4 matrix) to PARTS of that mesh. I would get the performance of 1 draw call, with the flexibility of multiple transforms. Right now every time I need transforms I either create a new GameObject with a separate mesh (more draw calls), or I rebuild the mesh every frame (slow performance). A lose lose.
What you’re describing is a pretty silly optimization. You could certainly do it, but it’s not like having 1 draw call means your game renders quickly, after all, you’re just shifting the processing elsewhere. If your game is experiencing performance issues then just use the profiler and optimize your scripts, implement LOD, bake your textures and models into atlases, etc. This kind of shader sillyness should be the last resort after squeezing everything else dry.
This IS a last resort Everything else HAS been squeezed dry. I’ve profiled. I’ve got LOD. I’ve baked textures. I can’t do much in shaders since colliders need to be manually synced with mesh manipulations (a mesh collider is too slow).
Everything in the game is procedurally generated.
I am constantly transforming vertices I generate - CONSTANTLY (in some cases 60fps)
That’s what sets the game apart.
Here’s an outdated video for our game:
Endlight
“smash through a concrete kaleidoscope”
(please watch in 720p 60fps)
Being able to transform parts of a mesh would make the tunnels even more crazy.
On mobile I’ve maxed out draw calls and changes to .vertices.
I haven’t maxed out GPU transforms.
So… a silly optimization to you, but a killer optimization for me!
I just don’t know how to do it.
Looks cool. One issue with doing this shader transformation is that it requires setting shader arrays, and I’m pretty sure there’s no built in way to do this in Unity. Additionally, there would be no collision with your player, as the mesh itself wouldn’t be updating.
Because your game is based on so many cubes, you should try merging them as much as possible, because in the demo video it looked like the majority of the cubes were fairly static, or were moving in locked groups. If you want fast dynamic movement then another thing you might want to experiment with is using Shuriken’s mesh rendering, and setting the particles array directly.
You can combine meshes into a single skinned mesh that will only have one draw call, with the individually moving parts as separate “bones”. Here’s a script that does it: http://wiki.unity3d.com/index.php/MeshMerger
That’s probably the easiest way to do it. You could probably also get the same result as this with a shader somehow (passing in the matrices to transform specific vertices). In player settings you can try turning on GPU Skinning which may help CPU load (needs DirectX 11). Check the CPU and GPU performance with/without the combining into a skinned mesh.
I felt your pain since I’d been struggling with the same thing - and the information is hard to find when you don’t really know what you’re looking for! I was looking at doing the transforms in a shader for a while before I found out how to do it more easily with the built-in skinning support.
I’ve actually got mine working now and I can also say it’s great. Major performance boost. Check your profiler for how long the actual skinning work is taking, but I’ve got quite a lot of separate parts and it’s still only a small percent. You can turn off GPU Skinning to check how much CPU time it’ll take for people with pre-DX11 graphics cards.
Hey guys, sorry to necro but this something I’m getting into recently.
I got here through the idea of using bones to drive parts of a merged mesh, however, I’m pursuing something a little bit insane. I’m looking to drive characters in groups by their bones(to position them), and then animating them. The second part is more so where I’m currently coming blank. Any ideas on playing animations on these mesh components? I don’t think it’d be worth it to manually manipulate the vertices and would be a regression in performance from simply keeping them all separate.
I’m pretty sure you can do it. Play the animations on your original character’s transforms, and have those SkinnedMeshRenderers combined into one big SkinnedMeshRenderer, and I’m pretty sure you can get the animations to just keep working.I think you’ll need to copy the bones from the individual character SkinnedMeshRenderers into the combined SkinnedMeshRenderer, which the script above isn’t doing. Sorry I can’t give you an exact solution or a guarantee, but I’m pretty sure it’s possible.
I have some animations in my combined SkinnedMeshRenderers, but they’re not bone animations, they’re just animating transforms (and they’re also legacy animations). There seems to be some code around to combine SkinnedMeshRenderers into one like this and this.
Of course you’d also have to check that your performance actually improves!