IJobForEach only uses one core?

I have a job using IJobForEach that is trivially parallel across entities and when I look in the debugger I see this:

4520869--418450--rvo_job.PNG

I was under the impression that IJobForEach supported parallelism across cores. If that is not the case, what is the intended workflow for implementing such behavior?

I’ve also unable to remove parallel write restrictions from BufferFromEntity when using it in multiple IJob’s I manually schedule in a loop. Is this intended behavior or a bug?

Are you using .Schedule() or ScheduleSingle() on your job?
Are you using a command buffer? If so is it marked as concurrent?

It’s parallelize work per chunk.

IJobForEach runs parallel on chunks, how many chunks are you processing?

Parallel write from multiple Jobs is not supported as far as I can tell (concurrent versions allow parallel from 1 job)

edit: sorry, forgot to press send and in the meanwhile there are already 2 replies…

@siggigg I’m using Schedule and no command buffers.

I probably do not have enough entities to break across a chunk border in this example. Having chunks be the only determining factor for IJobForEach parallelism isn’t great if that’s true. ECS principles aren’t about how many entities you can shove on screen at once, but how efficiently you can split up work across them.

Parallel write to most containers is definitely supported via the NativeDisableParallelForRestriction and NativeDisableContainerSafetyRestriction attributes. If you know what you are doing it is perfectly reasonable and expected that you need this functionality.

Also if the per-chunk concurrency is true and not just a current limitation that will be enhanced in the future, it would be great if the docs could mention this. Currently they say If you used the Schedule() method instead, the system uses parallel jobs to process the entities.

you can always use the generic IJobParallelFor to go parallel

to which container have you been able to write to from multiple parallel scheduled IJobs — I recall that I had trouble doing this a few ECS version back (I have not used ECS in a bit)

If I have a regular job containing a mutable BufferFromEntity that is marked to disable safety checks, when I actually pull a buffer out of it inside of a job the safety check system blows up and complains that concurrent writes are not allowed.

I haven’t had issue with other containers so far.

Also an issue with IJobParallelFor is you cannot schedule it and express to the job system your dependency on an EntityQuery. So if I want to use BufferFromEntity within it I cannot because it would be unsafe for the job system to schedule.

I did extensive performance tests on a custom collision system a few months back - initially i tried to do parallelize as much as possible, but parallel writes to containers are costly (you gain much less than scheduling independent jobs parallel) - but I don’t know your setup.

I think I used a Buffer with interlock to write in parallel to it (worked, but was not worth it in the end) — let me see if I find it

I’m not writing to the same DynamicBuffer at once. I have N entities, each of which I am doing a CPU intensive task on, and individually they write to their own buffers/component data. It’s a perfect use case for disabling parallel safety checks.

Basically I’m going to need to allocate a bunch of intermediate result containers via Allocator.Temp, have my IJobParallelFor write to those instead of a DynamicBuffer, and then have another job at the end run via a IJobForEach and copy the contents into the per-entity DynamicBuffer. I then also need another system at the end of the frame to force complete the job and make a callback to deallocate the temp data.

This is the third time I’ve ran into this problem on my project, so it’s not a terribly uncommon use-case if you have some demanding CPU tasks unfortunately.

maybe this helps - old API

The SpriteToGridBufferParallelInterlockedJob is of your interest
update - job is scheduled here

        JobHandle MyUpdateJobified(JobHandle jobHandle)
        {
            //sampler.Begin();
          
            ClearNativeContainers();
          
            var collisionBufferCandiateArray = collisionCandidateKeys_Group.GetBufferArray<CollisionInfoBuffer>();
            var collisionBufferPairArray = collisionPairKeys_Group.GetBufferArray<CollisionPairBuffer>();
  
            // Clear spatial grid - ParallelFor
            jobHandle = new ClearBufferDictionaryJob
            {
                collisionCandidateKeyArray                = collisionBufferCandiateArray,
                collisionPairKeyArray                    = collisionBufferPairArray
            }.Schedule(collisionBufferCandiateArray.Length, 1, jobHandle);
          
            // Assign sprites to spacial grid - IJobProcessComponentDataWithEntity
            var bufferLocks = new NativeArray<int>(collisionBufferCandiateArray.Length, Allocator.TempJob);
            jobHandle = new SpriteToGridBufferParallelInterlockedJob
            {
                grid                                    = myGrid,
                keyArray                                = collisionBufferCandiateArray,
                bufferLocksArray                        = bufferLocks
            }.ScheduleGroup(sprite_Group, jobHandle);
          
            // Check collisions per grid - ParallelFor
            jobHandle = new AABBCollisionBufferToBufferJob
            {
                collisionCandidates             = collisionBufferCandiateArray,
                collidingEntities                = collisionBufferPairArray
            }.Schedule(collisionBufferCandiateArray.Length, 1, jobHandle);
          
            // Merge collisions per grid into a unique collision pair / collision entity hashmap - ParallelFor
            jobHandle = new MergeCollisionsPerGridDistinctFromBufferJob
            {
                InputBuffer                        = collisionBufferPairArray,
                DistinctCollisionPairHashMap    = distinctCollisionPairHashMap.ToConcurrent(),
                DistinctCollisionEntityHashMap  = distinctCollisionEntityHashMap.ToConcurrent()
            }.Schedule(collisionBufferPairArray.Length, 1, jobHandle);
      
            // Color Colliding Entities - IJobProcessComponentDataWithEntity
            jobHandle = new ColorCollidingEntitiesJob
            {
                DistinctCollisionEntityHashMap        = distinctCollisionEntityHashMap,
                hitColor                            = Settings.hitColor,
                normColor                            = Settings.normColor
            }.Schedule(this, jobHandle);
          
            // start processing all scheduled jobs
            JobHandle.ScheduleBatchedJobs();
          
            //jobHandle.Complete();
            //sampler.End();
          
            return jobHandle;
        }

SpriteToGridBufferParallelInterlockedJob

    [BurstCompile]
    public struct SpriteToGridBufferParallelInterlockedJob : IJobProcessComponentDataWithEntity<Box>
    {
        [ReadOnly] public ColGrid grid;
        [NativeDisableParallelForRestriction, WriteOnly] public BufferArray<CollisionInfoBuffer> keyArray;
        [NativeDisableParallelForRestriction, DeallocateOnJobCompletion] public NativeArray<int> bufferLocksArray;
          
        public void Execute(Entity e, int i, [ReadOnly] ref Box box)
        {  
            var boxMinGrid = (int2) ((box.Center - box.Extends - grid.Min) * grid.OneOverCellSize);
            var boxMaxGrid = (int2) ((box.Center + box.Extends - grid.Min) * grid.OneOverCellSize);
  
            for (int x = boxMinGrid.x; x <= boxMaxGrid.x; x++)
            {
                if (x >= 0 && x < grid.Dim.x)
                {
                    for (int y = boxMinGrid.y; y <= boxMaxGrid.y; y++)
                    {
                        if (y >= 0 && y < grid.Dim.y)
                        {
                            var key = x + y * grid.Dim.x;
                            unsafe
                            {
                                while(Interlocked.CompareExchange(ref ((int*)bufferLocksArray.GetUnsafePtr())[key], -1, 0) != 0) {}  
                            }
                              
                            keyArray[key].Add(new CollisionInfoBuffer{entity = e, box = box});
                            bufferLocksArray[key] = 0;
                        }
                    }
                }
            }
        }
    }

That’s using a deprecated API. ArchetypeChunk has a GetBufferAccessor on it now, but that seems usable only within an IJobChunk – which has the same issue as IJobForEach, I need maximum per-entity parallelism.

Yes, I wrote it uses old API, actually I just saw that I updated it at the time…maybe this is a bit newer but also used deprecated API (i.e. the job rename, etc.)

Update, scheduling job

        JobHandle MyUpdateJobified(JobHandle jobHandle)
        {
            //Profiler.BeginSample("COLLISION");
            ClearNativeContainers();
           
            var collisionBufferCandiateFromEntity = GetBufferFromEntity<CollisionInfoBuffer>();
   
            // Clear spatial grid - ParallelFor
            jobHandle = new ClearCandidateBufferFromEntityDictionaryParallelForJob
            {
                collisionBufferEntityArray                = collisionBufferEntityArray,
                collisionBufferCandiateFromEntity        = collisionBufferCandiateFromEntity
            }.Schedule(collisionBufferEntityArray.Length, 1, jobHandle);
           
            // Assign sprites to spacial grid - IJobProcessComponentDataWithEntity
            var bufferLocks = new NativeArray<int>(myGrid.CellCount, Allocator.TempJob);
            jobHandle = new SpriteToGridBufferNewApiParallelInterlockedJob
            {
                grid                                    = myGrid,
                keyArray                                = collisionBufferCandiateFromEntity,
                keyIndexArray                            = collisionBufferEntityArray,
                bufferLocksArray                        = bufferLocks
            }.ScheduleGroup(sprite_Group, jobHandle);
           
            // Check collisions per grid - ParallelFor
            jobHandle = new AABBCollisionBufferToHashMapNewApiJob
            {
                collisionCandidates                     = collisionBufferCandiateFromEntity,
                DistinctCollisionPairHashMap            = distinctCollisionPairHashMap.ToConcurrent(),
                DistinctCollisionEntityHashMap          = distinctCollisionEntityHashMap.ToConcurrent()
            }.ScheduleGroup(collisionCandidateKeys_Group, jobHandle);
       
            // Color Colliding Entities - IJobProcessComponentDataWithEntity
           
            jobHandle = new ColorCollidingEntitiesJob
            {
                DistinctCollisionEntityHashMap            = distinctCollisionEntityHashMap,
                hitColor                                = Settings.hitColor,
                normColor                                = Settings.normColor
            }.Schedule(this, jobHandle);
           
           
            // start processing all scheduled jobs
            JobHandle.ScheduleBatchedJobs();
            //jobHandle.Complete();
            //    Profiler.EndSample();
            return jobHandle;
        }

job (newer api)

    [BurstCompile]
    public struct SpriteToGridBufferNewApiParallelInterlockedJob : IJobProcessComponentDataWithEntity<Box>
    {
        [ReadOnly] public ColGrid grid;
        [NativeDisableParallelForRestriction, WriteOnly] public BufferFromEntity<CollisionInfoBuffer> keyArray;
        [ReadOnly] public NativeArray<Entity> keyIndexArray;
        [NativeDisableParallelForRestriction, DeallocateOnJobCompletion] public NativeArray<int> bufferLocksArray;
           
        public void Execute(Entity e, int i, [ReadOnly] ref Box box)
        {   
            var boxMinGrid = (int2) ((box.Center - box.Extends - grid.Min) * grid.OneOverCellSize);
            var boxMaxGrid = (int2) ((box.Center + box.Extends - grid.Min) * grid.OneOverCellSize);
   
            for (int x = boxMinGrid.x; x <= boxMaxGrid.x; x++)
            {
                if (x >= 0 && x < grid.Dim.x)
                {
                    for (int y = boxMinGrid.y; y <= boxMaxGrid.y; y++)
                    {
                        if (y >= 0 && y < grid.Dim.y)
                        {
                            var pos = x + y * grid.Dim.x;
                            var key = keyIndexArray[pos];
                            unsafe
                            {
                                while(Interlocked.CompareExchange(ref ((int*)bufferLocksArray.GetUnsafePtr())[pos], -1, 0) != 0) {}   
                            }
                           
                            keyArray[key].Add(new CollisionInfoBuffer{entity = e, box = box});
                            bufferLocksArray[pos] = 0;
                        }
                    }
                }
            }
        }
    }

I’m not sure I see how this is different though. IJobProcessComponentData == IJobForEach and IJobForEach apparently only supports concurrency per-chunk. So same issue with it not allowing for per-entity concurrency.

Regardless, thanks for the suggestions. I’m waiting to see a Unity response to this for some additional clarification.

Per entity parallelism it’s not the best choise. This is why unity use per chunk logic for IJFE. You get overhead of scheduling job per entity. If you still want it, only way (better of all) is use IJobParallelFor with low batch count.

You can. Of course if you write safely.

Hold on. You had 2 issues

  • IJobForEach → I said use IJobParallelFor if you do not want the per chunk parallelism of IJobForEach

  • parallel write to buffer → the above examples with the old api (although I am not sure I understood you correctly)

I cannot access DynamicBuffer in this job type safely (as far as I’m aware, please point me to an example if it exists!). So then I must externally allocate some temp data array storage. But then I cannot provide an array of native arrays (or whatever, could be NativeQueue) for the IJobParallelFor to lookup into by index. So right back to using N * IJob.

Those examples above use either an old API, or the IJobForEach equivalent that has issues outlined in my previous post.

If no one sent example to you before tomorrow, I’ll show you tomorrow, cos now I’m in bed :slight_smile:

I disagree. The assumption here is that you have so many entities that “of course” you’d want to avoid job scheduling overhead by batching them. But in my example I am running an RVO algorithm that is very expensive per-entity. If I only ever have as many entities that fit in a single chunk (or two) I am hitting a major bottleneck.

In my post above I outline why this isn’t an option due to the need for a re-sizable native container per-entity.

I have not found an example that shows how to express an EntityQuery dependency in a normal job type. Every example I’ve seen writes to pre-allocated native containers and then eventually writes back into the ECS data.

Yes please! It’s using the most recent API right?