[Jobs][Lags] JobTempAlloc has allocations that are more than 4 frames old

Hello,

I’m actually using jobs to work with meshes and data.
I’m using List/Array both can be resized at any moment into the job Execute().

This message + Spike/Lag appear on log :
Internal: JobTempAlloc has allocations that are more than 4 frames old - this is not allowed and likely a leak

What is the best way to work with few allocations in Jobs and why this message cause a massive spike ? (Except pre-caching, that is not possible all times)

(Editor,Standalone-Mono,Standalone-IL2CPP)

Thanks.

1 Like

Hi dyox,
This might be a bug. Could you please create a small reproduction project and submit it with a bug report via the bug reporter? If you post the case # here we can process it faster.

Detailed instructions on how to submit bug reports can be found here.

It’s sent (Case 989338), and here is the code.

using System.Collections.Generic;
using UnityEngine;
using Unity.Jobs;

public class JobAlloc : MonoBehaviour
{
    static public JobAlloc Instance;

    public struct Job : IJobParallelFor
    {
        public void Execute(int i)
        {
            for (int Size = 0; Size < Instance.Size; ++Size)
                Instance.list[i].Add(i); // <--- Internal: JobTempAlloc has allocations that are more than 4 frames old - this is not allowed and likely a leak
        }
    }

    public int Count = 4096;
    public int Size = 4096;
    public List<int>[] list;

    public JobHandle Handle;
    // Use this for initialization
    void Start () {
        Instance = this;
        list = new List<int>[Count];
    }
  
    // Update is called once per frame
    void Update () {
      
        if(Handle.IsCompleted)
        {
            for (int i = 0; i < Count; ++i)
            {
                list[i] = new List<int>(Count / 8);
            }

            Job job = new Job();
            Handle = job.Schedule(Count, 1);
        }
    }
}
1 Like

[Update] I’ve found a new way to create this spike/message without any alloc (Mono-JIT)

public class JobData : Monobehavior
{
static public readonly byte[][] Datas = new byte[][]
{
new byte[] { 0,1,2,3,4,5,6},
new byte[] { 0,1,2,3,4,5,6},
new byte[] { 0,1,2,3,4,5,6},
new byte[] { 0,1,2,3,4,5,6},
};
}
public struct Job : IJobParallelFor
    {
        public void Execute(int i)
        {
           byte[] a =  JobData.Datas[i]; // <--- Internal: JobTempAlloc has allocations that are more than 4 frames old - this is not allowed and likely a leak
        }
    }

It appear that using a static field on mono (not IL2CPP), cause an allocation at least one time in the first job accessing it.
Creating a Warning + Lag.

Any update ?

you should not access global (i.e. static) stuff from jobs, as it would introduce race conditions.
you should put a NativeArray inside the job and use that

@UT shouldn’t the job compiler prevent this kind of stuff?

Its totally a non sense.
Jobs are threads and threads can access to everything.
Accessing to a static variable do not mean race condition.
Example here with a readonly static field.
Passing data with only native array completely destroy the power of multithreading if we need to copy from main thread all data all time.
Also in many algorithms lookup tables are used , how use them if we can not access to static field ?

2 Likes

I think @Joachim_Ante_1 mentioned they would have some error checking to come…

Why aren’t you just using NativeArray/NativeList on the main thread too? No need to copy data around…

I looked at the bug and replied a few minutes ago, replying here to in case someone else finds this.

Accessing statics is indeed not intended to be allowed and will be protected against in the future. Instead you should pass native data structures through the job data. This is not possible in many cases with just NativeArray, you need NativeList, NativeQueue and NativeHashMap.

The reason for the warning about TempJobAllocation is that the jobs take too long (more than 4 frames) which is currently not supported. The fix for this would be to make sure to call Complete to wait for the jobs if they take more than 4 frames.

Calling Complete is also required to clean up the data for the safety system, so you really need to call it. Calling Complete when IsComplete is true is almost free if you want to make sure you do not wait.

The main reason for this job taking so long is that the job does a managed allocation which is extremely expensive and often makes the jobified solution slower than running the code single threaded.
Another reason is that the jobs are scheduled in batch and there is no explicit flushing of the batches in this cases which means the jobs are scheduled later than it seems. This can be fixed by calling JobHandle.ScheduleBatchedJobs(); after scheduling.

5 Likes

Hi. I tried to allocate a NativeArray in the job itself. I needed a temporary array for a calculation. According to the profiler it still used 302 bytes of GC. Is there a way around this allocation when creating a native array?

1 Like

I thought the idea was to allocate it outside but fill it in inside…

1 Like

for a single IJob it would work. But imagine if you have a IJobParallelFor job of 100 000 items. This will be split in multiple jobs based on the innerLoopBatch count.

you can not allocate a temporary native array from the outside that all those jobs use.

The issue with the 302 bytes is the same outside of the Jobs. it allocates this on GC to create a NativeArray, independent of the size of the array. If the array had a resize function we could have a pool of arrays to get no GC, but as it is now it will allocate when you create one.

Lennart

2 Likes

In such situation I think you should allocate continues buffer once and access it using an index offset in each job, in similar manner such stuff is done in CUDA or equivalent solution.

2 Likes

It would be helpful in general to say why statics are not allowed, otherwise you simply perpetuate myths about concurrency. There is no race condition that can come just from accessing static data in multiple threads, it’s done all the time in many apps. It’s a very common pattern with concurrent collections in high throughput applications. There might be good reasons not to do that in the context of how jobs work internally, but that should be clarified. We already have enough ignorance in the game development community on concurrency generally.

6 Likes

Maybe also clarify why 4 frames old is important.
We must count each frame and call job.complete on main thread.
So it’s totally useless and laggy to have this sort of multithreading.
At 120 fps, 4 frames is too short for jobs. 120 fps → job.complete()-> lag…

Actual system works and keep the fps stable even at 95% of cpu usage.
When using System.Thread at only 50% cpu usage, spikes and lags appear .
(8 cpu core example : 7 jobs+render thread+main thread + 7 System.Thread).
Maybe just open jobs and let user decide what to do with them.

4 Likes

I wonder if filling in a job and processing in a dependent forjob is possible (today is my make art day, I’m not touching VS)

I think they want to control their update path and it depends on the data that jobs can access.
[AllowStatics] and some warning that can be hidden with a flag would be a nice compromise.

The allocation is caused by a managed object we create for tracking memory leaks. There is currently no way around it. We have some ideas for fixing it for temporary allocations, but I cannot give an estimate on when that will be implemented.

The reason we are not allowing it (at the very least by default) is that the goal of the Unity job system is to guarantee that there are no race conditions.
Multiple threads reading static data is not a race condition - unless someone is writing to it at the same time. With static variables we do not know who else is accessing the data so we cannot guarantee that no one else is writing it while the job is running.

So, the limitation is not because we know it is a race condition, but because we cannot guarantee that it is not.

6 Likes

So far we have been focusing on using all cores for the simulation rather than making it asynchronous - which means you almost always complete the job within 1 frame. The 4 frame limit comes from the fact that we are using a specialized allocator for this case, not that we actively try to limit anything.
Long running asynchronous jobs are slightly different and require some tweaking, but allowing you to choose a different allocator for such jobs which does not have to complete within 4 frames seems like a good start.