How to avoid long-running background jobs being executed on main thread?

Calling JobHandle.Complete() has the behavior that if the relevant job isn’t ready yet, it will try to work on some jobs on the main thread while it waits. This seems reasonable, but recently I’ve run into an interesting edge case. I have a number of long-running Large (~30ms) async jobs that are running in the background, and sometimes calling JobHandle.Complete() on a completely unrelated Small job, grabs a pending Large job while it waits for the Small job to complete. This results in the Large background job running on the main thread, which obviously causes bad frame spikes, since they were never indented to run on the main thread.

How can I guard against this? The fact that calling JobHandle.Complete() has the potential to start running any pending job on the main thread makes it difficult to have long-running background jobs, since I seem to always run the risk of accidentally running one of these background jobs on the main thread.

This seems to have nothing to do with Job dependencies either. I can have a completely isolated single Small job with no dependencies, call Complete() on it, and start running background tasks on the main thread.

If you want multi frame jobs you need to do 3 things.

  1. Only work on data that isn’t used anywhere else - this means ALL entity data is off limits unless it’s a copy. Clone your (entity) data and work on that instance then copy it back when it’s done

  2. Don’t pass your dependency handle back to the entity world - this way it’ll never be completed in the chain except if…

  3. DON’T USE ALL THREADS - you must limit your job to less threads than available. In fact you should probably only use 1 thread max. If you ever hit a situation where all threads are occupied and the main thread needs to wait on a Complete() then your job will be Completed to ensure the main thread can continue as fast as possible.

An example system would look something like this

public class TestClass : SystemBase
{
    private JobHandle handle;
    private NativeArray<int> myData;
    private void OnUpdate()
    {
        if (!handle.IsComplete)
        {
            // Last frame job still running, need to wait for it to finish
            return;
        }
   
        // Job finished but still need to finalize
        handle.Complete();
   
       // Write back myData to whatever entities etc you want. This must be a copy, you can't pass actual entity data here (buffers, chunkarchetypes etc)
        WriteResult(myData);     

       // Get a new copy of the data from entities etc
        myData = GetData();

        handle = new Job() { MyData = myData }.Schedule(this.Dependency);
   
         // Passing in the entity dependency is fine but you can't write back to it
    }
}

It absolutely annoys me that people have passed on that long-running jobs are ok. It was only a matter of time until someone ran into an issue like this. The C# Job System is designed for distributing main thread work onto worker threads assuming those worker threads even exist which is not a guarantee. Having a job intentionally run for multiple frames under that paradigm just isn’t a good idea.

The better solution is to create a custom background thread and .Run() the job there. Or just break up the jobs into smaller pieces so they don’t run so long.

Unfortunately I can reproduce this even following these three steps. For one, I am not using ECS, this is just a job system thing. Additionally, I am not even close to saturating the worker threads, I can reproduce this with just a few single-threaded jobs. I am uploading my example script here for testing, as well as an example sample from my profiling showing an instance where the LargeJob was grabbed by the main thread while calling Complete() for a SmallJob.

I was afraid that would be the solution :frowning:

HOWEVER: From the example script I posted, you can see that it isn’t just a matter of multi-frame jobs, it can also affect very simple situations, like with the example I posted with only 3 regular jobs. This is just three jobs, and the LargeJob could easily be a single-frame job that is meant to be completed near the end of the frame. But in this case, the LargeJob is being completed too early, which would lead to that frame being dropped.

7950481–1017931–JobCompleteTest.cs (1.23 KB)
7950481--1017934--ProfilerTest.png

I didn’t say “multi-frame jobs”, I said “long-running jobs” and in this case, even your simple case, your large job is too long.

Anyways, there’s quite a few workarounds to this problem:

  1. Optimize the large job (Burst helps a lot)
  2. Break up the large job into multiple smaller jobs
  3. Make the large job parallel
  4. Run small jobs on the main thread while the large job is running.
  5. Dispatch the large job to a background thread. Thread contention isn’t an issue if Unity’s worker threads are going to be idle.
  6. Use a pointer to an atomic variable in the large job to see if the job has been initiated by a worker thread, and if not, spin on JobHandle.IsCompleted until either happens. Combine with JobHandle.ScheduleBatchedJobs for better efficacy.