Use case for Mesh.MeshData?

Hi!

I’ve been investigating the Mesh.MeshData API to build meshes from jobs. So far I think I’m not fully grasping its intended use, as it seems a bit silly.

Calling Mesh.AllocateWritableMeshData(); returns an array of multiple MeshData structs, each one containing vertex/index data for a single mesh. We’re supposed to modify this data (possibly in a job), and then commit it to a list of meshes using Mesh.ApplyAndDisposeWritableMeshData(). Is this correct?

If so, this results in rather coarse threading granularity and terrible workload balancing since you can only do a parallel for over all meshes. If you have 100 small meshes this is ok, if you have 3 huge meshes then only 3 worker threads will be busy. Alternatively, you could schedule one job per mesh and do each mesh in parallel, but if the meshes are small this is very wasteful.

My current implementation solves this by creating a couple large NativeArrays for both vertex and index data (their length equal to the combined vertex/index count of all meshes) and then doing a parallel for over vertices or triangles, regardless of which mesh they belong to. Once the data is ready I just call SetVertexBufferData() with the appropriate start and count values for each mesh*.*

In my tests this is much faster than using MeshData in all cases (many small meshes/few large meshes), since threads can work on data for all meshes simultaneously regardless of the size and/or amount of meshes. Overall code is also a bit simpler. And if the amount of vertices/triangles doesn’t change over time, it does not involve allocating/deallocating memory every frame (unlike ApplyAndDisposeWritableMeshData, that disposes of the MeshData once applied).

Assuming I didn’t misunderstood the motivation behind Mesh.MeshData: Is there any use case that justifies using it over an approach similar to the one I outlined above?

2 Likes

How I work with Meshdata:

  • generate vertex information in a job (ie terrain, voxels) in parallel
  • set up meshdata in a job (if vertex/index count is known ahead of time, this can be done while vertices are generated)
  • assign vertices in parallel (including, for instance, calculating normals or applying position offsets, scale, etc)
  • assign indices in parallel at the same time the assign vertices jobs run (they do not depend on each other)
  • when all assignment jobs are complete, apply meshdata

I process each mesh‘s data individually. The overhead of creating multiple arrays (one per mesh) is negligible.

Of course this approach may not work for all cases. I tested it with primitives and terrain (plane or voxel chunks). What is your use case? What kind of meshes are you generating?

To me it sounds like you are thinking that you‘ll have one job for each mesh. As you can see in my example, granularity can be further split up into vertex data generation, assigning vertices, and assigning indices and possibly even using a job for the meshdata setup (one per mesh, or parallel for all meshes). The latter however requires knowing vertex/index count up front.

PM me if you like to see example code.

Thanks for your input on this! :slight_smile: I’m thinking about either:

A) one IJobParallelFor per mesh that parallelizes over data for each mesh.
B) single IJobParallelFor that parallelizes over meshes.

A) is best when you have few large meshes. B) is best when you have many small meshes. However both are suboptimal for my use case, which is marching cubes for arbitrary multiple meshes. I don’t know the size and amount of meshes beforehand, so cannot optimize for either case using MeshData.

I also work on vertices and indices separately like you do (schedule 2 jobs, then combine their handles), but the principle is the same: create a large vertex array to hold data for all meshes, parallelize over all vertices from all meshes, then assign a slice of the single large array to each mesh. Same for indices.

I think we have different definitions of “granularity”, I’m referring to the amount of work done by a single batch/work item in a parallel for job, not having many smaller jobs.

Yes, these are the two cases I have in mind. I guess the core of my question is why force us to choose between one job per mesh or one job for all meshes, when you can easily parallelize over all triangles/vertices of all meshes using a simpler API? This is better in both cases: many small meshes, few large meshes, as well as anything in between.

Imho, MeshData could be improved by having only two arrays: one for vertices, and another for indices, then GetMeshIndexStartAndCount and GetMeshVertexStartAndCount methods that return the offset and start in these arrays for each mesh. This way you can still access each individual mesh if you so choose, but you can also process all vertices in parallel. Similar to how submeshes work, really.

Currently we can do this manually (which is what I do). To me, MeshData just seems like a poorly thought out wrapper around this, and why I asked about it.

1 Like

Hmmm interesting. Though I‘m still not sure I understand. Can you put that in pseudo-code what you need to do right now vs what you would like it to be?

My problem understanding this is with generating all vertices for all meshes at once - how do you determine where one mesh starts and ends?

For instance, assuming we were talking about voxel chunks, then each chunk would be a separate mesh but also a separate data structure of voxel data, so it seems natural to process that and you end up with one mesh per chunk (out of many). Your use case however seems to indicate that you intend to generate vertices more or less infinitely (fractals, LODs, random, etc) but you need to split it into separate meshes every (eg) 65.536 vertices, or based on some other conditions like one mesh per material.

Have you thought about generating the vertices the way you like to? Then you end up with a single large array of vertices. Then you spawn vertex assign jobs using NativeSlice of your large array, one per mesh. This should be super fast thanks to Burst. If that better suits your use case you should try that and check whether this is faster than your current approach (with optimizations enabled/maxed).

For each mesh you get a “start” into the large array, and a “count”. Both for vertices and indices, that’s it.

If you mean how do you determine which mesh a vertex belongs to inside a job, there’s many ways to do it depending on your use case. For mine, I allocate an array with an extra byte per triangle, that encodes a mesh index. Then in the job I simply retrieve the index of the mesh it belongs to, and then I know the first vertex index and vertex count for the current mesh if needed.

This isn’t needed in many cases though. For the sake of simplicity, let’s consider one of the simplest possible use cases: inflating a list of meshes (extruding vertices along the triangle normal). Using my approach, you’d do (pseudocode):

Awake()
{
// allocate a large array and copy all mesh vertex values to it, only needs to be done once:
sourceVertices = new NativeArray<MeshVertex>(allMeshVertexCount, Allocator.Persistent);
CopyMeshVertices(meshes, sourceVertices);

// allocate destination array for inflated vertices:
inflatedVertices = new NativeArray<MeshVertex>(allMeshVertexCount, Allocator.Persistent);
}

Update()
{
// in a job, parallel over all vertices regardless of the mesh they belong to:
void Execute(int i)
{
inflatedVertices[i].pos = sourceVertices[i].pos + sourceVertices[i].normal * inflate;
}

// after the job has completed:
for(int i = 0; i < meshes.Length; ++i)
{
m.SetVertexBufferData(inflatedVertices, start[i], 0, count[i]);
}
}

Another use case, where you do need to know which mesh each vertex belongs to would be displacing several water planes using wave sources/generators:

Awake()
{
  // allocate a large array and copy all mesh vertex values to it, only needs to be done once:
  sourceVertices = new NativeArray<MeshVertex>(allMeshVertexCount, Allocator.Persistent);
 
  // this should fill the "vertexToMesh" array used to map from vertex to mesh in the job:
  CopyMeshVertices(meshes, sourceVertices);

  // allocate destination array for deformed vertices:
  deformedVertices = new NativeArray<MeshVertex>(allMeshVertexCount, Allocator.Persistent);
}

Update()
{
  // Determine which wave sources (a simple struct with position, frequency, amplitude) influence each mesh.
  //Could be done using spatial hashing, bvhs, or just plain O(n2) loop over all meshes/sources:
  AssignWaveSources();

  // in a job, parallel over all vertices regardless of the mesh they belong to:
  void Execute(int i)
  {
    //array that maps from vertex index to mesh index
    int meshIndex = vertexToMesh[i];

    //iterate over all sources affecting this mesh
    for (int k = 0; k < sourcesForMesh[meshIndex]; ++k)
    {
      deformedVertices[i].pos.y = sourceVertices[i].pos.y + sourcesForMesh[k].HeightAt(sourceVertices[i].pos.xz);
    }
  }

  // after the job has completed:
  for(int i = 0; i < meshes.Length; ++i)
  {
    m.SetVertexBufferData(deformedVertices, start[i], 0, count[i]);
  }
}

For certain use cases, determining the mesh a vertex belongs can be done with no extra memory cost (in a regular grid of meshes for instance, you can just quantize the vertex position).

This way you can process all vertices of all meshes in parallel in a single job, spreading the workload evenly across all worker threads. This is something that -to my knowledge- you can’t do using MeshData: you’d either parallelize across meshes, or schedule an individual job for each mesh (resulting in overhead from scheduling multiple jobs, and subpar workload distribution).

Even in this case, you have three options:

1)- Do all chunks in parallel in a single job. Different chunks output different amounts of geometry, so once a worker thread is finished with its chunk, it will be idle until all other threads have finished their chunks, wasting time.
2)- Schedule one job per chunk, then combine dependencies. This is better, but still doesn’t balance as nicely and requires to potentially schedule many jobs.
3)- Allocate one large vertex array for all chunks, parallelize over all voxels in a single job and then write back slices of the large array to each mesh.

I believe the third approach is faster than the other two, since you have a single job, and no idle threads. Depending on where your voxel data comes from it might not be easy/possible to do, though. My particular use case is realtime marching cubes/slices, dividing space into evenly-sized chunks, but each mesh can be composed of any number of chunks which I do not know in advance. Sometimes 1 mesh = 1 chunk, other times 1 mesh = >8 chunks.

Having separate arrays for each mesh in MeshData only allows for 1) or 2). Having a single large array and start/count for each mesh allows for all 3 approaches.

This is exactly what I do, but instead of spawning vertex assign jobs just to copy slices of the single large array to each individual array of a MeshData, I simply call mesh.SetVertexBufferData(myLargeArray, start, 0, count) for each mesh directly. As far as my tests go, it’s faster than using MeshData.

I guess this is what I don’t understand about the MeshData API, in my case it just seems like an extra unnecessary step and I struggle to think of use cases where it would actually be of any benefit compared to rolling your own scheme using the other existing mesh APIs.

First thing I did today was to test my system with the same overall load but creating many meshes with few vertices and comparing that to few meshes with many vertices - overall both variants operated on the same amount of data (100100004 vertices).

The result shows that in my system many small meshes take roughly 15 times longer to complete:

That’s unsurprising but tells me I should maybe look into other ways of performing the “many small meshes” task. Curious to try out options as well as your “modify all vertices across meshes” approach.

Am I right to assume that in the waves example the “sea” isn’t just one single mesh due to the number of vertices and/or to enable frustum culling? Otherwise in theory it could be a single mesh.

I realize now that I was having a hard time understanding your use case because I was having in mind what you would normally see in a game, like an explosion that destroys parts of the terrain and affects only parts of the world (a couple chunks/meshes at a time). But in your case you seem to want to animate or modify the “entire world” so to speak.

It may be worth checking inside the MeshData code (I use Rider so I can conveniently look at the disassembled code) to see how the individual mesh buffers are obtained, in case you’re curious and don’t mind spending a day or two on a proof of concept. It is my understanding that the underlying data structure is actually one big array and we’ll just get returned slices of that array, one per mesh. If you can figure out a way (not sure if possible) to get the underlying pointer to the mesh data and modify that directly, this could be what you’re looking for and would certainly be a good addition to expose this directly in the MeshData API (=> feature request).

FWIW I’ll make a test with SetVertexBufferData to see if I get any speed improvements out of that too. But it’ll take me some time to change my system to work without the MeshData API.

And this is the Mesh.AllocateWriteableMeshData code (it calls new on Mesh.MeshDataArray and following is the constructor):

      internal unsafe MeshDataArray(int meshesCount)
      {
        this.m_Length = meshesCount >= 0 ? meshesCount : throw new InvalidOperationException(string.Format("Mesh count can not be negative (was {0})", (object) meshesCount));
        this.m_Ptrs = (IntPtr*) UnsafeUtility.Malloc((long) (UnsafeUtility.SizeOf<IntPtr>() * meshesCount), UnsafeUtility.AlignOf<IntPtr>(), Allocator.Persistent);
        Mesh.MeshDataArray.CreateNewMeshDatas(this.m_Ptrs, meshesCount);
        this.m_MinIndex = 0;
        this.m_MaxIndex = this.m_Length - 1;
        DisposeSentinel.Create(out this.m_Safety, out this.m_DisposeSentinel, 1, Allocator.TempJob);
      }

CreateNewMeshDatas is in C++ (InternalCall) though.

Correct. Frustum culling isn’t the only reason why you’d want to use separate meshes: each plane could have different resolution, or if used for more complex deformation, different physics/quality settings (viscosity, turbulence, amount of gerstner waves, noise octaves, etc). They don’t have to be “pieces of a larger self”.

Yes, in a typical voxel terrain setting you would only modify a handful of chunks at a time (get the chunks intersected by the explosion radius, delete voxels within radius, update these affected chunks). Also since chunk mesh updates are event-driven instead of happening every frame, it would probably be wise to trade chunk generation speed for better/more granular occlusion/frustum culling. The cost of a typical frame would be determined by how fast you’re able to render your chunks.

My use case involves regenerating chunks from scratch every frame, and since mesh generation is always costlier than rendering, I’m not as concerned about efficient culling as I am about efficient mesh data handling.

Thanks for the pointers! (no pun intended). Will surely try to get this to work.

I believe exposing slices is way more flexible, the API choice surprised me so I thought maybe I was misunderstanding MeshData and/or using it incorrectly.

1 Like