Cellular Automata with DOTS

I’ve been toying around with a cellular automata implementation in DOTS but can’t find an elegant and fast solution.

In essence, I would like to run a job that updates the state for each cell depending on the state of its neighbors. My initial approach was to pass a ComponentDataFromEntity to the job, but of course DOTS won’t let me read+write to the same set of components in parallel.

How could I go about passing the old (previous frame) CellComponent data as an input for my job, while allowing the job to overwrite the new (next frame) CellComponent in parallel? Something like (pseudo code):

struct CellularAutomaton : IJobForEachWithEntity<CellComponent>
{
  [ReadOnly] public ComponentDataFromEntityCLONED<CellComponent> previousFrameData;

  public void Execute(Entity entity, CellComponent cellComponent)
  {
     /* update cellComponent using previousFrameData of my neighbors */
  }
}

I can make it work using EntityQuery.ToComponentDataArray and EntityQuery.CopyFromComponentDataArray but it seems a bit too slow to me.

Would you choose another approach altogether?

What you’re describing is a multi-step process even though it can be done in one job it’s still important to just understand how this multistep process works. You essentially want to calculate the new value, but not apply it immediately.

There are multiple approaches to this, but in my view this would be the simplest:

Have one component: CellComponent

Using an EntityCommandBuffer you’ll then have a job that calculates each cell’s new value, but doesn’t apply it immediately. When submitting changes to the EntityCommandBuffer the values will only be applied at the sync point the ECB would run.

If this is still too slow there might be some value building up some structure that enables faster querying of the data in question before you run your job, but that likely gets complex. The above solution will still allow the entirety of your job to run knowing that you won’t override data that you’ll need for your calculations.

1 Like

I would just add the previous cell state component to your entities.
And adding [NativeDisableParallelForRestriction] if you only modify the cell you are visiting and checking adjacent cells previous cell state seems reasonable.
At the end of the frame you can then have a system setting all previous states to current states.

My proposed solution would not require write permission on the current entity’s cell component. I would assume it’s there to serve as the initial value to be modified, not completely overwritten. The use of an EntityCommandBuffer would mean the write operation is delayed until that sync point so there’s no need to then require [NativeDisableParallelForRestriction] since all you’re doing is reading and not any writing.

I mentioned that it’s a multistep process as the new value for CellComponent can only be written AFTER all entities have their new CellComponent values calculated. It could be possible to buffer these values yourself by writing the value to a different component and setting up another job that’ll copy the value back after the main calculation job is finished, but that’s essentially what an EntityCommandBuffer achieves already.

All that said you might find that having some custom data structure that can take care of the calculations and eventually write out your data as entities with components might be a more optimal approach. It mostly depends on how “alive” this algorithm should feel. If it’s being used as level generation there’s little need to have all the data that’s necessary for the generation process be managed by the entity manager. On the other hand if it’s a core part of gameplay having to interact with the data there’s more to gain by managing it using entity manager.

Great feedback! :smile:

I agree both suggested approaches could be considered multistep (or “double buffered”) processes. Also, both are quite easy to implement which is nice!

Doing a quick test implementation, I couldn’t get the EntityCommandBuffer working with BurstCompile so I think the separate component approach is the better/faster option right now.

Thanks a lot!

@kingstone426 did you use the concurrent version of the ECB in your solution? I suspect if you were using the ECB as is you would’ve had issues because of an IJobForEach running in parallel.

Yes I am using EntityCommandBuffer.Concurrent which seems to be working ok except it prevents burst compilation.

Processing 1 million entities takes around 6ms using the “two component approach” but more than 1200ms (!) using the “ECB approach”. Even with burst disabled, the “two component approach” takes only 220ms, so maybe my implementation is completely bonkers.

Here is the code for the “two component approach”:
(I decided to use a Next-component instead of a Previous-component, but I suppose it’s the same. And I love generics!)

using Unity.Burst;
using Unity.Collections;
using Unity.Entities;
using Unity.Jobs;

[assembly: RegisterGenericComponentType(typeof(Next<CellComponent>))]

public class CellularAutomatonTwoComponents : JobComponentSystem
{
    protected override void OnCreate()
    {
        using (var tempArray = new NativeArray<Entity>(1000000, Allocator.Temp))
        {
            EntityManager.CreateEntity(EntityManager.CreateArchetype(typeof(CellComponent),typeof(Next<CellComponent>)), tempArray);
        }
    }

    protected override JobHandle OnUpdate(JobHandle inputDeps)
    {
        var jobHandle = new ProcessForEach() {cells = GetComponentDataFromEntity<CellComponent>()}.Schedule(this, inputDeps);
        jobHandle = new FlipBuffersJob().Schedule(this, jobHandle);
        jobHandle.Complete();
        return jobHandle;
    }
}

public struct CellComponent : IComponentData
{
    public float state;
}

public struct Next<T> : IComponentData where T : struct, IComponentData
{
    public T value;
}

[BurstCompile]
public struct ProcessForEach : IJobForEachWithEntity<Next<CellComponent>>
{
    [ReadOnly] public ComponentDataFromEntity<CellComponent> cells;
  
    public void Execute(Entity entity, int index, ref Next<CellComponent> next)
    {
        next.value.state = cells[entity].state + 1;
    }
}

[BurstCompile]
public struct FlipBuffersJob : IJobForEachWithEntity<CellComponent, Next<CellComponent>>
{
    public void Execute(Entity entity, int index, ref CellComponent cell, [ReadOnly] ref Next<CellComponent> next)
    {
        cell = next.value;
    }
}

And here is the code for the “ECB approach”:

using Unity.Collections;
using Unity.Entities;
using Unity.Jobs;

public class CellularAutomatonEntityCommandBuffer : JobComponentSystem
{
    private EntityCommandBufferSystem entityCommandBufferSystem;
  
    protected override void OnCreate()
    {
        // Entities are instantiated in CellularAutomatonTwoComponents
      
        entityCommandBufferSystem = World.GetOrCreateSystem<EndSimulationEntityCommandBufferSystem>();
    }

    protected override JobHandle OnUpdate(JobHandle inputDeps)
    {
        var jobHandle = new ProcessForEachCommandBuffer() { cells = GetComponentDataFromEntity<CellComponent>(), commandBuffer = entityCommandBufferSystem.CreateCommandBuffer().ToConcurrent() }.Schedule(this, inputDeps);
        entityCommandBufferSystem.AddJobHandleForProducer(jobHandle);
        jobHandle.Complete();
        return jobHandle;
    }
  
    //[BurstCompile]
    private struct ProcessForEachCommandBuffer : IJobForEachWithEntity<CellComponent>
    {
        [ReadOnly] public ComponentDataFromEntity<CellComponent> cells;
        public EntityCommandBuffer.Concurrent commandBuffer;

        public void Execute(Entity entity, int index, [ReadOnly] ref CellComponent cellComponent)
        {
            commandBuffer.SetComponent(index, entity, new CellComponent { state = cells[entity].state + 1 });
        }
    }
}

Please let me know if something is completely out of wack with my tests! :slight_smile:

Disclaimer: I’m new to DOTS, but I had this problem so I’ve experimented a bit. However, that was on entites 0.1/0.2, so I’m not sure how it’d look in the new version.

The problem is highly dependent on your exact cast. For example, if you are doing something like Game of Life, which is a grid, the solution is fundamentally different than if you need to find ‘nearest neighbor’.

My original version, pre-DOTS, was a simple sequential solution that worked over arrays (jagged, multi, or single). Inside Unity, this capped out my frame rate very quickly… maybe 150x150 grid (IIRC).

My first ECS/DOTS removed arrays and instead created an entity for every grid/array position. Each entity had a grid position (x,y) component and also had 2-4 components that represented the neighbors. I later changed this to 3-8 for bitmasking tests. I also had current state and future state components. Then I had a system that ran for every component that had the neighbor (this eliminated edge cases - eg, 0,0 would have have no “left/down” neighbor entity at -1,0 0,-1 or -1,-1). I had 4-8 systems, depending on the case, looking at each entity that had a particular “neighbor”. Each would += a value on the entity being calculated’s future state. For example, in the bitmask example, I would have each system (8 of them this time) increment by it’s future state (1,2,4,8,16,32,64,128). I’d ensure these completed, then run a system to change the current state to the future state. For example, overwrite the current bitmask by the newly calculated bitmask. In my case, I’d also reset the future state bitmask. As an aside, I ended up doing it my way because I needed to reduce the work on one frame and was willing to process only “one direction” in a frame, something that most cases wouldn’t allow. Using dynamic buffers might have worked better otherwise. Using entities was certainly faster (by a factor of ~100) regardless.

My next, and current, solution was to use persistent native arrays. My actual case is more involved than the Game of Life “alive/dead”, but it’s similar in nature. I would use two native arrays, with the position in the array corresponding to its 2d grid position. One array holds current state, and the other array contains ‘future state’. I use a IJobParallelFor, passing in the two arrays and the width of the grid (future state = read/write, current state = read only, width of grid = read only). Using the index that is passed in, I check on each neighbour and set the future state. In my case, I would then draw the grid according to futurestate (comparing it to current state first - no change means no redraw - as I’m using unity’s tilemap), then overwrite the currentstate with futurestate.

NativeArrays works better because it can be jobified and burst - ECS actually slows things down. That applies while it is a ‘simple’ grid problem.


I tried quite a few things, including ECB, but all of it was vastly inferior to using a multi-step approach. So, for a DOTS implementation, I would start with:

  1. Component with current state (byte ?)
  2. Component with neighbor references (entities)
  3. Component with future state (byte ?)

Then, forall entites with a current state, pass in (Readonly)current state, (Readonly)Neighbor References, (read/write)Future State. Using IJobForEachWithEntity, you can reference the current state in neighbors via currentState[neighbourEntityReference], and write/increment the future state value (you may have to trick it with writing as futureState[entity] - the current entity index - in many cases).

Once that completes, you can run a job passing in the current and future state, and just overriding currentstate from futurestate.

However, the native array approach was easily 10x faster for me. It’s identical in principle, except you pass in the arrays instead of components, and have no need for neighbor references (but will have out of bounds safety checks in the job logic).

3 Likes

Besides the fact that you are having a weird issue not getting Burst to work on EntityCommandBuffers, what you have is accurate. EntityCommandBuffers are slower than your extra component approach. In an EntityCommandBuffer, you are doing a copy in parallel to the buffer also writing metadata about the component type, index, and which entity to target. Then the buffer sorts by index and then has to play back each command in a single thread, decoding each command individually. Compare that to the extra component double-buffer solution which boils down to just two parallel memcpy operations.

If you don’t want to have a second component, you can use a NativeContainer instead, which should have equivalent performance. Often when I am dealing with interacting entities like this, I copy the component data into an acceleration structure, and then write results directly to the components.

2 Likes

I might just have been using an older Burst version that did not support SetComponent. I am working from another machine right now so I can’t confirm.

It seems logical that ECB is slower for the reasons you mention and perhaps an unfortunate drawback for ECBs in general.

I agree with DreamingImLatios here, no matter how optimized Command Buffers are, it’s always going to be slower to store a billion instructions to process later than to directly process data.

Also, in a real use case, at some point you’re gonna need to access current and previous positions in some systems (maybe for animating), and the data is already there. Whereas with ECB you are hiding this information inside commands.

Mind giving a short demonstration on how you go about writing the results to the components in a performant way? I’ve always struggled with this part given the current API. :frowning:

I’ve come across this type of problem before one approach is to use Native Arrays and limit the reading and writing to separate systems.

So you would have:

NativeArray currentValues;
NativeArray newValues;

System one takes in readonly currentValues and outputs write only newValues, this lets the system scan around the current cell without hitting boundary issues or overwriting data that will be resampled.

Then all you need to do is swap newValues and currentValues and repeat, this may need to be done in a system or via a memcpy operation.

I think the technique is often called double buffering as you use two buffers to prevent overlapping data issues.

Updated to Unity 2019.30f3 with latest Entities (0.4.0 preview.10) and Burst (1.2.0 preview.11) packages and ECB now works with Burst.

CellularAutomatonEntityCommandBuffer dropped down to around 9ms, however, that system only populates the ECB. EndSimulationEntityCommandBufferSystem still hogs around 900ms because ECB playback is not parallelized.

CellularAutomatonTwoComponents is steady at around 5ms and clearly the faster implementation!

I just use ComponentDataFromEntity. Using an acceleration structure means I am willing to pay the price of random access copies in and out of the structure in order to greatly improve the speed of a more computationally complex algorithm.

@kingstone426 Depending on your implementation, but if you keep your entities in a NativeArray and can guarantee that the entities won’t change their archetype you can use EntityQuery.ToComponentDataArray<T>(Allocator) which just copies the chunks components together and then you can access them with the same index as the NativeArray.

2 Likes

Holy smokes! Using EntityQuery.ToComponentDataArray(Allocator), we are now down to 4ms! :smile:

Here’s the code:

using Unity.Burst;
using Unity.Collections;
using Unity.Entities;
using Unity.Jobs;

public class CellularAutomatonComponentDataArray : JobComponentSystem
{
    private EntityQuery query;
 
    protected override void OnCreate()
    {
        query = EntityManager.CreateEntityQuery(typeof(CellComponent));
     
        using (var tempArray = new NativeArray<Entity>(1000000, Allocator.Temp))
        {
            EntityManager.CreateEntity(EntityManager.CreateArchetype(typeof(CellComponent)), tempArray);
        }
    }

    protected override JobHandle OnUpdate(JobHandle inputDeps)
    {
        var current = query.ToComponentDataArray<CellComponent>(Allocator.TempJob);
        var next = new NativeArray<CellComponent>(query.CalculateEntityCount(), Allocator.TempJob);
     
        inputDeps = new ProcessComponentNativeArrays() {current = current, next=next}.Schedule(current.Length, 64);
        inputDeps.Complete();
     
        query.CopyFromComponentDataArray(next);
     
        current.Dispose();
        next.Dispose();
     
        return inputDeps;
    }
 
    [BurstCompile]
    private struct ProcessComponentNativeArrays : IJobParallelFor
    {
        [ReadOnly] public NativeArray<CellComponent> current;
        [WriteOnly] public NativeArray<CellComponent> next;
 
        public void Execute(int index)
        {
            next[index] = new CellComponent {state = current[index].state + 1};
        }
    }
}

Could this be the fastest (non-gpu) implementation possible?

1 Like

Made a simple Game of Life to test the automaton implementation :smile:

Here’s the CA code:

using Unity.Jobs;
using Unity.Mathematics;
using Unity.Transforms;
using UnityEngine;
using Random = UnityEngine.Random;

public class CellularAutomatonComponentDataArray : JobComponentSystem
{
private const int width = 100;
private const int height = 100;

private EntityQuery query;

protected override void OnCreate()
{
query = EntityManager.CreateEntityQuery(typeof(CellComponent));

using (var tempArray = new NativeArray(width*height, Allocator.Temp))
{
// Allocate entities
EntityManager.CreateEntity(EntityManager.CreateArchetype(typeof(CellComponent)), tempArray);

// Setup entity adjacency references
for (var y=0; y<height; y++)
{
for (var x = 0; x < width; x++)
{
EntityManager.AddComponentData(tempArray[ToIndex(x,y)], new Translation { Value = new float3(x-width/2, y-height/2, 0) });
EntityManager.AddComponentData(tempArray[ToIndex(x,y)], new CellComponent() { state = Mathf.RoundToInt(Random.value) });

}
}
}
}

protected override JobHandle OnUpdate(JobHandle inputDeps)
{
var current = query.ToComponentDataArray(Allocator.TempJob);
var next = new NativeArray(query.CalculateEntityCount(), Allocator.TempJob);
inputDeps = new ProcessComponentNativeArrays() { current = current, next=next }.Schedule(current.Length, 64);
inputDeps.Complete();
query.CopyFromComponentDataArray(next);
current.Dispose();
next.Dispose();

return inputDeps;
}

[BurstCompile]
private struct ProcessComponentNativeArrays : IJobParallelFor
{
[ReadOnly] public NativeArray current;
[WriteOnly] public NativeArray next;

public void Execute(int index)
{
var pos = FromIndex(index);

var total = 0;
total += current[ToIndex(pos.x - 1, pos.y - 1)].state;
total += current[ToIndex(pos.x - 0, pos.y - 1)].state;
total += current[ToIndex(pos.x + 1, pos.y - 1)].state;
total += current[ToIndex(pos.x - 1, pos.y - 0)].state;
total += current[ToIndex(pos.x + 1, pos.y - 0)].state;
total += current[ToIndex(pos.x - 1, pos.y + 1)].state;
total += current[ToIndex(pos.x - 0, pos.y + 1)].state;
total += current[ToIndex(pos.x + 1, pos.y + 1)].state;

if (total==3)
next[index] = new CellComponent {state = 1};
else if (total < 2 || total > 3)
next[index] = new CellComponent {state = 0};
else
next[index] = current[index];
}
}
private static int Mod(int value, int modulus) {
return (value % modulus + modulus) % modulus;
}

private static int ToIndex(int x, int y)
{
return Mod(x, width) + Mod(y, height) * width;
}

private static int2 FromIndex(int i)
{
return new int2(i % width, i / width);
}

}

And here is the renderer:

using Unity.Burst;
using Unity.Collections;
using Unity.Entities;
using Unity.Jobs;
using Unity.Transforms;
using UnityEngine;

namespace Coherence
{
[UpdateInGroup(typeof(PresentationSystemGroup))]
public class CA_TextureRenderer : JobComponentSystem
{
private readonly Texture2D texture = new Texture2D(CellularAutomatonComponentDataArray.width, CellularAutomatonComponentDataArray.height, TextureFormat.RGBA32, 0, false);

private NativeArray colors = new NativeArray(CellularAutomatonComponentDataArray.width * CellularAutomatonComponentDataArray.height, Allocator.Persistent);

protected override void OnCreate()
{
texture.filterMode = FilterMode.Point;
Object.FindObjectOfType().sprite = Sprite.Create(texture, new Rect(0, 0, texture.width, texture.height), new Vector2(0.5f, 0.5f));
base.OnCreate();
}

protected override void OnDestroy()
{
colors.Dispose();
base.OnDestroy();
}

protected override JobHandle OnUpdate(JobHandle inputDeps)
{
var colorJob = new ColorJob {colors = this.colors};
var jobHandle = colorJob.Schedule(this, inputDeps);
jobHandle.Complete();
texture.SetPixelData(colorJob.colors, 0);
texture.Apply();
return jobHandle;
}

[BurstCompile]
private struct ColorJob : IJobForEach<CellComponent, Translation>
{
[NativeDisableParallelForRestriction] public NativeArray colors;

public void Execute(ref CellComponent cell, ref Translation translation)
{
colors[CellularAutomatonComponentDataArray.ToIndex((int)(translation.Value.x-CellularAutomatonComponentDataArray.width/2), (int)(translation.Value.y+CellularAutomatonComponentDataArray.height/2))] = new Color32((byte)(cell.state * 255), 0, 0, 255);
}
}
}
}

1 Like

Pretty awesome. How many ‘tiles’ can you process without rendering, roughly? I got up to 2300x2300 (5.29 million) at 13ms with my cobbled together abomination that uses only native arrays. Obviously a different type of solution, but I wanted to see a comparison. What does your CellComponent look like now? I used a tilemap to quickly test it, but that’s obviously too slow for anything practical. I’m really curious what the fastest implementation for rendering would be!

I wanted to optimize it so that I didn’t need safety checks, which I’ll try later too. My plan is to just start at +1,+1 and end at -1,-1 on the grid, setting the borders’ state to 0/dead, and ensuring it doesn’t get drawn. Wanted to do it with full safety checks first.

Oh, as I’m writing this, I was just thought of making it one pixel each and writing it via Texture2d (GetRawTextureData and such). Something else I’ll try!

My cobbled together test:

using Unity.Jobs;
using Unity.Collections;
using Unity.Burst;

[BurstCompile]
public struct SetFutureState : IJobParallelFor
{
    [ReadOnly] public NativeArray<byte> currentState;
    public NativeArray<byte> futureState;
    [ReadOnly] public int width;
    [ReadOnly] public int height;


    public void Execute(int jobindex)
    {
        // safety checks;
        bool _left = (jobindex) % width != 0;
        bool _up = jobindex < (width * (height - 1));
        bool _right = (jobindex + 1) % width != 0;
        bool _down = jobindex >= width;

        int _stateCount = 0;

        if (_left)
        {
            _stateCount += currentState[jobindex - 1];
            if (_down)  _stateCount += currentState[jobindex - width - 1];
            if (_up)  _stateCount += currentState[jobindex - 1 + width];
        }

        if (_right)
        {
            _stateCount += currentState[jobindex + 1];
            if (_up)  _stateCount += currentState[jobindex + width + 1];
            if (_down)  _stateCount += currentState[jobindex + 1 - width];
        }
    
        if (_up) _stateCount += currentState[jobindex + width];
        if (_down) _stateCount += currentState[jobindex - width];

        if (currentState[jobindex] == 0)//is dead
        {
            if (_stateCount == 3) futureState[jobindex] = 1; //make alive
        }
        else
        {
            if (_stateCount == 2 || _stateCount == 3) futureState[jobindex] = 1; //make alive
            else futureState[jobindex] = 0; //make dead
        }
    }
}

And a normal monobehavior game controller. Could be converted, but… simple and easy to see what is happening.

using Unity.Jobs;
using Unity.Mathematics;
using UnityEngine;
using Random = UnityEngine.Random;
using Unity.Entities;
using Unity.Collections;
using UnityEngine.Tilemaps;

public class GameController : MonoBehaviour
{
    #region Serialized Fields for Controller
    [SerializeField] private int width = 100;
    [SerializeField] private int height = 100;

    [SerializeField] private Tile aliveTile;
    [SerializeField] private Tile deadTile;

    [SerializeField] private Tilemap GOLTileMap;

    [SerializeField] private bool render = true;
    [SerializeField] private bool useTimer = false;
    [SerializeField] private float updateInterval = 1;
    #endregion

    private float updateCounter = 0;
  
    // The only real data the program needs
    public NativeArray<byte> currentState;
    public NativeArray<byte> futureState;

    // Avoid GC. Won't work if we try to compare current/future states to avoid excessive draws.
    // Really though, tilemap is super slow, only here to test.
    Vector3Int[] positions;
    Tile[] tileArray;

    void Awake()
    {
        currentState = new NativeArray<byte>(width * height, Allocator.Persistent);
        futureState = new NativeArray<byte>(width * height, Allocator.Persistent);

        positions = new Vector3Int[futureState.Length];
        tileArray = new Tile[positions.Length];
      
        for (int i = 0; i < currentState.Length; i++) { currentState[i] = (byte)Random.Range(0, 2); }
    }

    private void Start()
    {
        UpdateFutureState();
        if (render)
        {
            GOLTileMap.gameObject.SetActive(true);
            DrawTileMap(); }
        else
        {
            GOLTileMap.gameObject.SetActive(false);
        };
    }

    void Update()
    {
        if (useTimer)
        {
            updateCounter += Time.deltaTime;
            if (updateCounter >= updateInterval)
            {
                updateCounter -= updateInterval;
                UpdateFutureState();
                if (render) DrawTileMap();
                currentState.CopyFrom(futureState);// update the current state by overwriting from future state
            }
        }
        else
        {
            UpdateFutureState();
            if (render) DrawTileMap();
            currentState.CopyFrom(futureState);// update the current state by overwriting from future state
        }
    }
      
    private void DrawTileMap()
    {
        for (int i = 0; i < positions.Length; i++)
        {
            positions[i] = Arrayto3dGrid(i);
            if (futureState[i] == 0)
            {
                tileArray[i] = deadTile;
            }
            else
            {
                tileArray[i] = aliveTile;
            }
        }

        GOLTileMap.SetTiles(positions, tileArray);
    }

    private void UpdateFutureState()
    {
        // create the job to read current state of neighbours and set the future state of each object
        JobHandle SetFutureStateJH = new SetFutureState
        {
            currentState = currentState,
            futureState = futureState,
            width = width,
            height = height
        }.Schedule(currentState.Length, 64);

        // ensure the job is complete
        SetFutureStateJH.Complete();
    }

    private Vector3Int Arrayto3dGrid(int arrayPosition)
    {
        return new Vector3Int(arrayPosition % width, (int)math.floor(arrayPosition / width), 0);
    }
  
    private void OnDestroy()
    {
        currentState.Dispose();
        futureState.Dispose();
    }
}

Edit:

I changed it to use texture2d.GetRawTextureData() and write to it in another IJobParallelFor. Got it up to full screen - 1920x1080, so ~2 million cells, operating at ~165 iterations/frames per second. Theoretically, probably could just barely do 4k at 60 frames.

Image:

3 Likes