Edit: The answer to this question can be found here: ECS concepts | Package Manager UI website
Just out of curiosity, I’ve been inundating myself in DoP and am curious how Archetypes were implemented in terms of optimal memory / cache line usage.
The following is from the ECS samples package, the HelloCube->3. IJobChunk example. An archetype is created to execute a job that requires a pair of components, namely the rotation and rotationSpeed:
[BurstCompile]
struct RotationSpeedJob : IJobChunk
{
public float DeltaTime;
public ArchetypeChunkComponentType<Rotation> RotationType;
[ReadOnly] public ArchetypeChunkComponentType<RotationSpeed_IJobChunk> RotationSpeedType;
public void Execute(ArchetypeChunk chunk, int chunkIndex, int firstEntityIndex)
{
var chunkRotations = chunk.GetNativeArray(RotationType); // what happens in memory when you do this? how does this work with the cache line?
var chunkRotationSpeeds = chunk.GetNativeArray(RotationSpeedType);
for (var i = 0; i < chunk.Count; i++)
{
var rotation = chunkRotations[i];
var rotationSpeed = chunkRotationSpeeds[i];
// Rotate something about its up vector at the speed given by RotationSpeed_IJobChunk.
chunkRotations[i] = new Rotation
{
Value = math.mul(math.normalize(rotation.Value),
quaternion.AxisAngle(math.up(), rotationSpeed.RadiansPerSecond * DeltaTime))
};
}
}
}
When executing over a large set of component pairs it makes sense to arrange the component pairs in memory so that the job iteration can get as many of the component pairs in contiguous memory so as to optimally utilize the cache line. So in this case ideally the memory layout would be:
chunkRotations[0] - chunkRotationSpeeds[0] - chunkRotations[1] - chunkRotationSpeeds[1] - chunkRotations[2] - chunkRotationSpeeds[2] - etc.
Am I correct in assuming that defining an Archetype arranges the components in memory in such a way?
The 2 consecutive chunk.GetNativeArray() calls at the start of Execute() seem to imply that the chunkRotations and chunkRotationSpeeds arrays are separate arrays and thus would create a lot of cache line misses when iterated over, especially as the amount of components grows.
My understanding of memory layouts and cache utilization is very basic at this point but any insight would help me better understand how to optimally use these systems and would be much appreciated.