ECS Memory Layout

I’ve read the ECS features in detail section of the documentation and want to see if my understanding of the data layout for entities/components is correct.

Chunks
Data is stored by Entity Archetype in 16kb chunks.
A chunk is arranged by component streams. So all of component A, followed by all of component B etc.

Is the chunk split up on creation such that the space for all component streams is already reserved? Like so:
[A, A, A, A, A, A][B, B, B, B, B, B]
even if there’s only one Entity? When you add an Entity, you just copy the component data straight to their relative index positions. This is pretty neat as allocating n entities of an archetype is virtually a no op.
Or do you compact the streams such that they occupy the memory like so
[A, A][B, B]
for two entities. If you add an Entity to this structure, then you have to move all component streams down the memory to get this [A, A, A][B, B, B]. I can’t imagine this would work anyway as it would involve re-indexing all the entities?

Entities
All entities are stored in a single EntityData struct array. Entity.index is the index into this array and EntityData provides a direct address to its Components. Is an Entity struct also stored in the chunk so it can refer back to the entities array? This is what EntityArray is generated from?
As a user can store Entity, am I right in assuming that the items in the entities array never change position? If you add 1000 entities and remove the first 999, that last entity is still going to be at the 1000th index?

Archetypes
If you add a new component to an Entity, it moves that Entity from its current chunk to a new chunk matching the new archetype. So there’ll be a chunk of memory for every possible archetype. If the user doesn’t specify a full archetype ahead of time, Unity will create one on demand along with a chunk for it. So adding a unique component to one Entity creates a new chunk just for that one entity.

ComponentDataArray
When we access the components via ComponentDataArray, are these direct pointers to the chunk data or are components copied into temp storage at the start of a system and back again at the end?
Looking at the source code for ComponentDataArray, the iterator jumps from chunk to chunk instead of being contiguous so I’d assume they’re direct pointers.

SharedComponentData (SCD)
An SCD is part of the archetype and each unique (by value, not type) instance of an SCD requires its own chunk. So an entity archetype will be split over as many chunks as there are unique SCDs.
The SCDs are stored in their own type arrays somewhere, not in the archetype chunk; the chunk just contains an index into that array.

Filtering
Does the SCD include metadata with references back to the chunks that match it?
So filtering on an SCD should be super quick depending on the Entity to unique SCD ratio.
If you had 1000 entities split up into units of 100 by SCD, then the filter would just search the 10 items in the SCD array and from that, can directly locate the relevant archetype chunks?

If I’m right on most of the above, then this structure is pretty damn awesome and I can see why creating and processing thousands of entities is so fast. I think I cleared up a lot of my own misunderstanding in the process of writing this out. Unless I’m completely wrong. :slight_smile:

7 Likes

Chunks
All components are preallocated. No need to move the components in a chunk.

Entities
Yes in every chunk all entities are saved and yes thats where entity array comes from. For the system they are really similar to components.

Archtype
Yes thats correct.

ComponentDataArray
Direct pointer into the chunk.

So far i didn’t look deep into SCD and filtering but that is correct afaik.

1 Like

Thanks Sgrueling

We know upfront based on the archetype how many entities fit into one chunk based on the components in the archetype. Thus adding an entity involves simply writing to the end of the stream. Until it is full. At which point we allocate/reuse a new chunk and fill that up.

Yes. In fact we have essentially an Entity as the 0 component. This is what EntityArray is using internally.

Correct. In the future will probably experiment with first allocating smaller chunk sizes and depending on how many chunks are in play for one archetype we can start using large chunk sizes. But so far it has not been a problem.

They are direct pointers, data is contigous within one chunk, when indices go to the next jump we calculate the next base address and continue iteration from there until we hit the end of the chunk.

IJobProcessComponentData is the most incrediable iterator since its literally the same speed as just working with pointers directly. So you could have two components with floats and copy a to b, and it would result in the same code as a batched memcpy etc…

SCD have their own manager with a freelist array of shared component data. And hashtables to quickly lookup by value.

SetFilter internally just resolves to an int index. And each chunk internally has an array of shared component indices used by this chunk. As a result filtering is insanely cheap and done per chunk. There is no cost per entity when using filtering.

It also means that SCD are close to zero memory cost on a per entity basis (Just one int per chunk), since there is literally zero data for each entity.

12 Likes

Where can I learn more about these concepts? Something I can’t figure out about Unity ECS is how it stores internally the filtered arrays and how it keeps the data between arrays in sync (intuitively I could say that data is duplicated between arrays sharing the same components).

Not sure what you mean by keeping the arrays in sync? There’s no duplication of data under the hood.
The docs I linked to in my first post do describe the architecture but I didn’t really understand that until I went through some source code. I’ll try explain, to my knowledge anyway, how the component data is stored and iterated, and then how filtering is just a simple extension of that process.
It’s very clever in its simplicity how they’ve done it in my view.

ComponentDataArray Iteration
Data is stored primarily by its archetype. Say you have three entities with just the components Position and Heading. That’s one archetype. The three entities will be stored in the same chunk of memory like so:

ArchetypeA Chunk (Position, Heading)
[ [Entity, Entity, Entity, ..n][Position, Position, Position, ..n][Heading, Heading, Heading, ..n] ]
where n is the total number of entities a single chunk(16kb) can store.

If you have another two entities that also contain a Movement component, then they are stored in a different chunk of memory as it’s a different Archetype.

ArchetypeB Chunk (Position, Heading, Movement)
[ [Entity, Entity, ..n][Position, Position, ..n][Heading, Heading, ..n][Movement, Movement, ..n] ]

In your System, lets say you request a group for just Position and Heading like so:

struct Group{
    int Length;
    EntityArray entity;
    ComponentDataArray<Position> position;
    ComponentDataArray<Heading> heading;
}
[Inject] Group group;

This group matches entities in both archetypes. Three from ArchetypeA and two from ArchetypeB.
group.Length will equal 5.
ComponentDataArray is an iterator, so group.position[n] is actually a function call. When you go
group.position[0…n], it results in the following.

position[0] =    ArchetypeA.chunk[0].position[0];
position[1] =    ArchetypeA.chunk[0].position[1];
position[2] =    ArchetypeA.chunk[0].position[2];
position[3] =    ArchetypeB.chunk[0].position[0]; // Note the change here
position[4] =    ArchetypeB.chunk[0].position[1];

Internally, ComponentDataArray[ ] iterates the archetypes and chunks based on the array index passed in. Were you dealing with thousands of entities, then you’d see ArchetypeA.chunk[0…n].position[0…n] before it got to ArchetypeB.
To us, it just looks like one contiguous array.
That’s how it steps you through all the entities that match a particular component group. It iterates by Archetype, then chunk, then Component array, which is what it says in the docs. There’s no alignment or syncing required.

Filtering
I haven’t looked through the SharedComponentData filter side of the source code yet but I think it works like this, in principle at least.
If we add a SharedComponentData to ArchetypeA above, then for each unique value of that SharedComponentData, you get a new chunk.
Lets say we add a MySharedComponent with value of 1 to the first two entities and a MySharedComponent with value 2 to the third one, it would break the ArchetypeA chunk into two chunks like so.

ArchetypeA (Position, Heading, MySharedComponent=1)
[ [Entity, Entity, ..n][Position, Position, ..n][Heading, Heading, ..n] SharedComponentIndex ] // 2 entities in this chunk

ArchetypeA (Position, Heading, MySharedComponent=2)
[ [Entity, ..n][Position, ..n][Heading, ..n] SharedComponentIndex ] // 1 entity in here

(As Joachim said, all it stores for the SharedComponentData in each chunk is a single index value used to look up the single MySharedComponent for that chunk)
So the data is already stored in filtered chunks by virtue of having a SharedComponent, whether you filter it or not in the ComponentSystem. When you go:
ComponentGroup.SetFilter(MySharedComponent=1)
it doesn’t reorder any data. It just applies a few extra steps to the first process outlined above so that you only get chunks that match the filter. Your filtered data is already stored in contiguous chunks of memory.
Pretty awesome.
Hopefully Joachim can correct me if I’m wrong on any of the above.
I don’t know what affect the job system has on the above process. Presumably parallelfor could even assign individual chunks to threads but I’ve no knowledge on that side.

Ideally someone with better graphic skills, or even Unity, will put together an image of the ecs memory architecture and add it to the docs as I think it will greatly help everyone with initial understanding of how ecs works. The architecture is so clean that a single graphic could explain it all at a glance.

6 Likes

@JooleanLogic thanks a lot for your time spent on this post! It makes things much clearer. I will need to spend a bit more time on understanding it, because I want to understand better how this archetype structure can affect the cache in the worst cases (or better saying how to avoid to break the cache), but otherwise it makes more sense now to me!

No probs sebas. Writing this stuff out helps me understand it better myself. There’s still a lot of details I’m unsure of.

If you’re about Joachim, could you explain how index maps to m_Cache.CachedPtr below in ComponentDataArray?

public T this[int index]{
    get{
        ...
        return UnsafeUtility.ReadArrayElement<T>(m_Cache.CachedPtr, index);
    }
}

ReadArrayElement just takes a void* and an index and I presume just returns type T from the address offset, but I don’t see where in the above function index gets clamped to the begin/end range of the cache chunk? Am I missing it or perhaps my understanding is wrong on this part of it.

If I go componentDataArray[10000] and that index is somewhere in chunk 5 of ArchetypeB, how is that index of 10000 mapped to m_Cache.CachedPtr? Does m_Cache.CachedPtr not point to the start of the Component array in the chunk? Wouldn’t index overflow?

Ah never mind. I should’ve just stepped through the source code first.
m_Cache.CachedPtr is negatively offset such that CachedPtr + index will give correct address of current chunk element.

Given the following (pseudo, hypothetical) code:

class MySharedComponent : ISharedComponentData
{
  enum PossibleValues {
    One,
    Two,
    Three,
    ...,
    Ten
  }
  PossibleValues Value;
}
class MySystem : ComponentSystem
{
  void OnCreateManager(int capacity)
  {
    var values = Enum.GetValues(typeof(MySharedComponent.PossibleValues));
    foreach (var v in values)
    {
      var entityManager = ..;
      var entity = entityManager.CreateEntity(typeof(MySharedComponent));
      entity.SetSharedComponentData(new MySharedComponentData {Value = v});
    }
  }
}

what memory layout would I end up with?

In my current understanding, each shared component instance with a unique value will result in a 16kb chunk allocation. In the example above I create 10 entities each with a shared component with a different value. Does this result in 10 * 16 = 160kb of memory being allocated for just 10 entities with a unique MySharedComponent each?

If so, does Unity plan to optimize this later with variable chunk sizes? Not all data types require big values of n, but it’s still useful having them in the ECS data structure.

I am implementing an event system where an event is represented as a single entity with an Event shared component on it. If there are 100 possible different Event values, I don’t want to end up with 100*16kb of memory allocated.

Thanks!

2 Likes

Yes you would end up with 10 mostly empty chunks, that’s a consequence how shared components work. Supporting variable chunk sizes is something we are looking into.
It’s very convenient and potentially fast to query for entities with specific shared components but it can be faster to use IComponentData and just search through the components.
It depends on how many different shared component values and entities you end up with.

3 Likes

Thanks for clarifying.

My understanding is that entities are stored entirely within one of the chunks belonging to their archetype.
What happens when an entity is larger than a single chunk?
How are entities with lots of data/arrays handled in general(or is having them a sign of a fundamental problem)?
Just starting off with ECS so excuse me if I’m missing something obvious :slight_smile:

Does anyone know of a relatively new pure ECS demo with a bit more complexity than boids?

What happens when an entity is larger than a single chunk?
Unity throws an exception. I am not sure how you would create a single entity with more than 16kb of data on it.

Do note that DynamicBuffer<> elements allocate memory outside of the chunk when it exceeds the default capacity. (Which is generally recommended to be quite small)
Thus most of such large array data will not be located in the chunk itself.

Are or will we be able to set a custom chunk size for a particular archetype?

I believe Unity has said no. Chunks are always the minimum standard size of an L1 or L2 cache (can’t remember which atm).

Sizing up a chunk could make it unable to entirely fit in the cache on some CPUs. Sizing it down…probably wouldn’t be a problem, but I can imagine Unity not considering this a common case. What are you aiming to do?

1 Like

Interesting, where did you hear this? Source?

The only thing I know says they might consider it. OK ECS works in 16K chunks but what is the instruction size limit in ECS? - Unity Engine - Unity Discussions

I would like to allow for bigger buffers to be stored in the chunk - so ideally, i could change the chunk size for a specific archetype

A cache line is 64 bytes large on most platforms. So no that is not the case.

We are looking at supporting different sized chunk sizes at some point. We haven’t decided on exactly how yet.

1 Like

It would be great to be able to somehow specify a modifier at IComponentData level so that any archetype with the component would increase/decrease the chunk size. I have a use case where entities with a specific component will be quite large but depending on some other components there will either be thousands or just a few entities per archetype.

Could this DynamicBuffer be used to store joint transform matrices for a pose component, for example? Where in memory is the allocation outside of the chunk, and what are the performance implications of this? Is there a more efficient way to compute poses?

Sorry, I’m new to Unity :(.