Job reading from a Collection does not allow another System to write to it on the next frame

Unity 2022.3.18f1
Entities 1.2.0
Collections 2.4.0

Let me explain the issue.

We have 2 Systems and 1 Job - WriterSystem, ReaderSystem & ReadFromListJob. The ReaderSystem updates after the WriterSystem and executes the Job ReadFromListJob.

On the first 3 frames the safety system allows me to write to the NativeList from the WriterSystem, however on the 4. frame and later it throws an InvalidOperationException, which states “The previously scheduled job ReaderSystem:ReadFromListJob reads from the Unity.Collections.NativeList”

As far as I am aware, scheduling a Job locks the safety handle of the respective Collection and after the Job completes it releases the safety handle of the said Collection. Furthermore, trying to access a Collection from the main thread, which has its safety handle locked will throw an Exception, which absolutely makes sense.

Thus, how is it possible that the WriterSystem, which always updates before the ReaderSystem (+ ReadFromListJob) is not allowed to write to the NativeList? ReadFromListJob completes before the next frame, so the safety handle should be released and thus allowing writing to the Collection.

Personally, I don’t know how to exactly check which Jobs or how many Jobs still have locked the safety handle. I tried using the static functions of AtomicSafetyHandle but wasn’t able to find a way.

The only way to prevent this conundrum is by applying the NativeDisableContainerSafetyRestriction Attribute on the Collection inside the ReadFromListJob or state.CompleteDependency() in the WriterSystem, which in my opinion are not elegant solutions.

I am curious if anyone else has/had this problem and how they have dealt with it.

EDIT:
As tertle mentioned below, completing the Dependency of the whole State isn’t a preferable solution and suggested to only complete the dependency on the Singleton Entity. I have adapted the code to include this solution.

[UpdateAfter(typeof(WriterSystem))]
[CreateAfter(typeof(WriterSystem))]
[UpdateInGroup(typeof(ReaderWriterSystemGroup))]
public partial struct ReaderSystem : ISystem
{
    public void OnCreate(ref SystemState state)
    {
        state.RequireForUpdate<SingletonTestCollection>();
    }

    public void OnDestroy(ref SystemState state) { }

    public void OnUpdate(ref SystemState state)
    {
        var testList = SystemAPI.GetSingleton<SingletonTestCollection>().testList;
 
        state.Dependency = new ReadFromListJob
        {
            list = testList
        }.Schedule(state.Dependency);
    }

    private struct ReadFromListJob : IJob
    {
        //[NativeDisableContainerSafetyRestriction]
        [ReadOnly]
        public NativeList<int> list;

        public void Execute()
        {
            UnityEngine.Debug.Log($"Reading {list.Length} items.");
        }
    }
}
[UpdateBefore(typeof(ReaderSystem))]
[CreateBefore(typeof(ReaderSystem))]
[UpdateInGroup(typeof(ReaderWriterSystemGroup))]
public partial struct WriterSystem : ISystem
{
    private int _iterationCount;
    private EntityQuery _clearCollectionSafety;

    public void OnCreate(ref SystemState state)
    {
        state.RequireForUpdate<SingletonTestCollection>();
        state.EntityManager.CreateSingleton( new SingletonTestCollection { testList = new NativeList<int>(32, Allocator.Persistent) });
        _iterationCount = 0;
        _clearCollectionSafety = SystemAPI.QueryBuilder().WithAllRW<SingletonTestCollection>().Build();
    }

    public void OnDestroy(ref SystemState state)
    {
        state.Dependency = SystemAPI.GetSingletonRW<SingletonTestCollection>().ValueRW.testList.Dispose(state.Dependency);
    }
 
    public void OnUpdate(ref SystemState state)
    {
        //state.CompleteDependency();
        //_clearCollectionSafety.CompleteDependency();

        var testList = SystemAPI.GetSingletonRW<SingletonTestCollection>().ValueRW.testList;
        UnityEngine.Debug.Log("Adding from Writer System. Iteration: " + ++_iterationCount);
        testList.Add(0);
    }
}

public struct SingletonTestCollection : IComponentData
{
    public NativeList<int> testList;
}
public partial class ReaderWriterSystemGroup : ComponentSystemGroup { }

What is happening here is that some other system is completing the JobHandle from ReaderSystem for the first 3 frames, but then that system stops running, probably do to queries no longer matching. Assume the problem occurs every frame.

As for the actual problem, I suspect there’s some weird rules with singletons and JobHandles that I can’t be bothered to understand. I use my own solution, because Unity hacked singletons into oblivion.

I believe that may very well be the case, but then I wonder why this would happen.
I even tried to manually reset the safety, but couldn’t figure out how to successfully accomplish this.

Indeed I have come across your framework and truly admire your dedication to it. However, I haven’t found a specific reason to incorporate it into my project yet. Do your Singletons address this dependency problem by any chance?

So to understand the problem here you have to understand the black box that is the safety system.

JobHandle.Complete doesn’t just make a job finish on the spot, it also releases the safety handles that the job holds.

This is why if you check JobHandle.IsComplete you still need to call Complete() before you access any native container the job has.

What’s happening in your case is that nothing between when the job was scheduled and when you’re trying to access it in main thread has called Complete(), so even though the job may have finished executing it hasn’t released its safety yet.

Basically if you ever do work on main thread from a singleton container you must call Complete to be safe - for performance you can do this on an entity query with just the singleton component instead of state.Dependency as this might accidentally complete a few more things.

In my opinion, the Job should automatically release the safety, since it is not accessing the Collection anymore. It even seems like this was the case before, but now it isn’t anymore?

That’s a very good idea. It still doesn’t feel like an elegant solution, but it is way better than completing the whole state dependency.

No. It has always worked the way it is currently working. You must always call Complete(). And the reason for that behavior is that if the job were to take way longer than usual on one frame, then it would fail, and every other time it would succeed and you would never see the bug. This way, you get errors no matter how long the job takes, forcing you to write correct code every time.

2 Likes

I see, in that case I must have misunderstood how the docs described the procedure. Thank you both for reaching out!