The other day, I was reading about how the EntityManager batch API (methods that take a query as an argument instead of a single entity) are a lot faster, due to being able to just update the chunk rather than move each entity between different chunks.
I decided to do a quick test on one of my systems that has to add multiple components to lots of entities at once, but can’t use a query. The code was originally using an EntityCommandBuffer, but the playback of the EntityCommandBuffer on the main thread was the slow part, so for simplicity it can be represented similarly as direct EntityManager calls on the main thread:
var buffer = EntityManager.GetBuffer<BufferComponent>(bufferEntity);
var entityArray = buffer.ToNativeArray(Allocator.Temp);
for (int i = 0; i < entityArray.Length; i++)
{
EntityManager.AddComponent(entityArray[i], typeof(TagComponent));
EntityManager.AddSharedComponentData<RenderMesh>(entityArray[i], renderMesh);
}
entityArray.Dispose();
My theory was, instead of adding both components to each entity in the loop, is it faster to add a temporary tag to each entity and then use a query and the batch API to add the components I actually care about, like so:
var tempQuery = GetEntityQuery(
ComponentType.ReadOnly<TempTagComponent>()
);
var buffer = EntityManager.GetBuffer<BufferComponent>(bufferEntity);
var entityArray = buffer.ToNativeArray(Allocator.Temp);
for (int i = 0; i < entityArray.Length; i++)
{
EntityManager.AddComponent(entityArray[i], typeof(TempTagComponent));
}
entityArray.Dispose();
EntityManager.AddComponent(tempQuery, typeof(TagComponent));
EntityManager.AddSharedComponentData<RenderMesh>(tempQuery, renderMesh);
EntityManager.RemoveComponent(tempQuery, typeof(TempTagComponent));
In some (admittedly unscientific) tests, the original approach would take about 47ms of main thread time to operate on ~1650 entities.
To my shock, the revised approach takes only 11.5ms main thread time to operate on 1900 entities!
My theory here is that the loop approach is actually way worse than it looks - each entity is getting moved between chunks twice (once for each component add), as the compiler isn’t able to reason about “where it will end up” after the whole command buffer has been replayed.
By contrast, the second approach only results in one chunk move per entity despite having two extra operations per entity.
In light of all this, I have a few questions:
- First, does this seem right? Or am I seeing some weird behaviour?
- EntityManager has AddComponent and RemoveComponent methods that accept NativeArray - do these benefit from the same gains as the query method, even though they’re not operating on chunks?
- If so, is there a plan to add an AddSharedComponentData override that accepts a NativeArray?
- If not, is there a more intuitive workflow to do this kind of operation efficiently? Or are there planned improvements to EntityCommandBuffer to allow it to automatically reason about these kinds of optimisations? (I’m sure that’s not as simple as it sounds!)
EDIT: it seems to be about 2ms faster again using the available NativeArray API to add the TempTagComponent:
var tempQuery = GetEntityQuery(
ComponentType.ReadOnly<TempTagComponent>()
);
var buffer = EntityManager.GetBuffer<BufferComponent>(bufferEntity);
var entityArray = buffer.Reinterpret<Entity>().ToNativeArray(Allocator.Temp);
EntityManager.AddComponent(entityArray, typeof(TempUndoTree));
entityArray.Dispose();
EntityManager.AddComponent(tempQuery, typeof(TagComponent));
EntityManager.AddSharedComponentData<RenderMesh>(tempQuery, renderMesh);
EntityManager.RemoveComponent(tempQuery, typeof(TempTagComponent));