How do archetypes work?

Edit: The answer to this question can be found here: ECS concepts | Package Manager UI website

Just out of curiosity, I’ve been inundating myself in DoP and am curious how Archetypes were implemented in terms of optimal memory / cache line usage.

The following is from the ECS samples package, the HelloCube->3. IJobChunk example. An archetype is created to execute a job that requires a pair of components, namely the rotation and rotationSpeed:

    [BurstCompile]
    struct RotationSpeedJob : IJobChunk
    {
        public float DeltaTime;
        public ArchetypeChunkComponentType<Rotation> RotationType;
        [ReadOnly] public ArchetypeChunkComponentType<RotationSpeed_IJobChunk> RotationSpeedType;

        public void Execute(ArchetypeChunk chunk, int chunkIndex, int firstEntityIndex)
        {
            var chunkRotations = chunk.GetNativeArray(RotationType); // what happens in memory when you do this? how does this work with the cache line?
            var chunkRotationSpeeds = chunk.GetNativeArray(RotationSpeedType);
            for (var i = 0; i < chunk.Count; i++)
            {
                var rotation = chunkRotations[i];
                var rotationSpeed = chunkRotationSpeeds[i];

                // Rotate something about its up vector at the speed given by RotationSpeed_IJobChunk.
                chunkRotations[i] = new Rotation
                {
                    Value = math.mul(math.normalize(rotation.Value),
                        quaternion.AxisAngle(math.up(), rotationSpeed.RadiansPerSecond * DeltaTime))
                };
            }
        }
    }

When executing over a large set of component pairs it makes sense to arrange the component pairs in memory so that the job iteration can get as many of the component pairs in contiguous memory so as to optimally utilize the cache line. So in this case ideally the memory layout would be:
chunkRotations[0] - chunkRotationSpeeds[0] - chunkRotations[1] - chunkRotationSpeeds[1] - chunkRotations[2] - chunkRotationSpeeds[2] - etc.

Am I correct in assuming that defining an Archetype arranges the components in memory in such a way?
The 2 consecutive chunk.GetNativeArray() calls at the start of Execute() seem to imply that the chunkRotations and chunkRotationSpeeds arrays are separate arrays and thus would create a lot of cache line misses when iterated over, especially as the amount of components grows.

My understanding of memory layouts and cache utilization is very basic at this point but any insight would help me better understand how to optimally use these systems and would be much appreciated.

1 Like

I am also not an expert on CPU caches. But it’s my understanding that the arrays will be loaded into separate cache lines. And since the access patterns are linear, the cache misses should be minimal (a miss every 64 bytes). Temporal locality will keep the data in cache for awhile.

Digging into this further it is my understanding that archetypes are nothing more than a convenient collection of components where each unique combination of components is its own archetype. When you create an entity prefab you’re actually just creating an archetype (though another entity prefab with the same combination of components would be part of the same archetype).
The archetypes are divided into chunks which are stored in consecutive memory.
I suppose further optimization could be performed to arrange the data in memory by the compiler based on what it can derive from the implementation of the job and system.

The process is described here: ECS concepts | Package Manager UI website