I’m looking for ways to optimize a common pattern in my project:
Components of EntityA write to a DynamicBuffer of EntityB. Later, a job processes all of those buffers in parallel.
Right now, I’m using a BufferFromEntity to find and write to the DynamicBuffer on EntityB. But I’m wondering if there’s a better way.
There are two assumptions I can make about this data:
EntityA and EntityB are partners. EntityA will only ever write to EntityB’s buffer.
EntityB’s archetype never changes. Once it’s created, it should never leave its original chunk.
I’m wondering if there’s anyway for these two entities to share ownership of this data, so that I don’t have to suffer cache misses when EntityA writes to EntityB’s buffer.
Could both of these entities have a Component which contains a pointer to the same data, via an unsafe collection? That sounds like it would create a cache miss in both directions.
Yes, but that would have the opposite problem: When it comes time for EntityB’s components to read from that buffer, they would need to grab it from EntityA.
That would also involve a BufferFromEntity, and also incur a cache miss.
I can’t just combine everything into a single Entity, because the chunk utilization would be very low.
Just to make sure, you know you dont need store DynamicBuffers in chunks?
If you store on the heap, that may resolve that problem.
Unless your problem is regarding other components, which take up chunk space.
Unfortunately, this sounds like it would make the problem worse:
If EntityA writes to its own buffer, and then EntityB reads from EntityA’s buffer, then at least you’ll only have one cache miss - when EntityB needs random access to grab EntityA’s buffer.
But if the buffer lives on EntityC, then you’ll also have a cache miss when EntityA needs to write to it. You’ll be doubling your cache misses.
Unless I’ve misunderstood Unity’s explanations of how they work - using BufferFromEntity structures inherently implies random memory access, which by definition involves cache misses.
Looking at the BufferFromEntity source shows that they don’t cache any buffer data. Instead they grab it, via pointer, when you ask for it.
As for perf concerns: This is one of the most common patterns in this project’s code. It makes up the majority of work being done. I’m not one to preoptimize, but in this case it makes sense to do a little digging up front. My question here is targeted at any ECS data experts, who might be able to spot an obvious mistake I’m making, or an obvious way I could rearrange this data to avoid the cache misses.
There may not be one, but I know there are people here who are smarter than I am.
Getting data in jobs is severely limited, if you are not passing in NativeArrays or use ComponentFromEntity/BufferFromEntity you have no other choices…
Dots wants simple data to be efficient but when your data structures get complicated then you have a problem.
It seems you have to spend time to prepare that optimized data to feed to the job or use alternative means
Thats my impression so far
Does every job need to be optimized to the max? you are already getting a big perfomance boost from multicore architecture
How large are your buffers? If you iterate linear over them (.asarray) all should be fine, no matter if they are in chunk or heap.
You just have the initial indirection for the heap pointer.
I am not objecting a question. And is indeed good to ask. It gives an opportunity others to learn too. Like myself.
However, I used Dynamic Buffers extensively, without any performance issues on their parts.
You could go down the pointer route, but will you want to sacrifice readability. Unless really need squeeze max out of it. It is really easy and nice to debug buffers.
Depending on the structure, data and relation between entities, is data is common for many entities, you can also consider hash maps. But it is not convenient to modify existing data in hashmaps.
I think you are worried about cache lines a bit too much. Everything is a cache miss. The only way to avoid the data being loaded up twice is to do the write and then read from the same Thread back to back. Any other architecture and cache gets invalidated.
“Cache Miss” performance is about ensuring that given your task (function, code block, etc) you don’t read/write all over the place in memory, in a tight loop. looping EntityA doing writes to the Buffers: Writes should occur in a contiguous fashion over a contiguous block of memory. Further optimizations requires knowing how big your cache lines are, and how big the buffers. And that’s something you should hold off on until it’s a bottleneck.
Instead of using BufferfromEntity could you not wrap all the buffers into a single linear nativearray. That’s what I did for a job and it sped it up quite a bit.