Is there any way to run an EntityCommandBuffer in parallel?
For example, I made a stress test with 500k entities that did some action. Some of these actions can end invalid and create an entity with a simple error component and the reason. This is written via ECB and because of the sheer amount of errors in one frame the ECB got completely overwhelmed. I saw this doesn’t seem to run in Burst (but I can be wrong about that) and it’s not running in parallel when this could help a lot.
So, long story, short, is there a current way or is it on Unitys todo list?
It uses batching to make instantiations significantly faster. And technically, something like this could be made parallel since the type of operations are significantly limited, which removes ordering constraints and concerns of entity removal. So while my implementation is just a fast main-thread operation, it could be possible to modify it to do component replication in parallel.
Cool! I’ll take a look at it.
Why do you say it’s likely never possible? I mean there are circumstances where there are race conditions but I would argue, with the right sorting and keeping same entities in 1 thread it should be theoretically possible to have ECBs multi-threaded.
I’ve asked this in the past as ECBs are a weird bottleneck. Right now it’s best to avoid them, although that’s often really complicated and easier said than done.
And on the topic, I think it’s just weird that something so simple then takes huge amounts of CPU time.
Such a sorting is mathematically impossible to solve faster than simply executing the commands sequentially. An entityB can be instantiated from an entityA at any point during the ECB playback. Prior to the instantiation, entityA might have had components added which need to be included during instantiation. And after the instantiation, entityA might be destroyed. On top of that, you need to ensure that not just the entities but also all the chunks they will ever live in during playback are made exclusive to a given thread. Those chunks need to be deterministic, which means you either conservatively generate new chunks and fragment everything, or you need to “dry run” all the commands in the command buffer to figure out all the chunks each entity will go through to reach their final destination. Those chunks need to be mutually exclusive per thread.
Anyways, the best way to get speed is to add more constraints to the problem. InstantiateCommandBuffer adds the constraint of only allowing one type of command to happen, such that there are no inter-entity dependencies. For that reason, it can be made a lot faster. There is one part of the playback algorithm that could be multi-threaded (it is an IJobFor which I use .Run() on). Normally doing so is a loss, but for your case since you are overwriting all the components expected for an entity, it might actually be a win. If there was a way to batch instantiate an entity and provide a list of components to leave uninitialized, then that would lead to a solution with maximum multi-threaded performance.