Generating chunks of noise in jobs

I am trying to parallelize chunk loading in my game. I’m considering several things and jobs seem really nice. I’ve never used jobs before but I’m quite familiar with low-level data management and I have some ancient experience with multithreading. Trying to overcome the limitations posed by jobs is quite the challenge though. I am NOT using ECS, this is pure job system and MonoBehaviours.

I’m trying to generate a 2D array of noise points in a job. The first hurdle I needed to overcome was that I use the FastNoise2 library. This is a library written without jobs in mind (it has state through a sort of node system, and it’s even a native library), but in theory, generating noise only reads from the noise generator and doesn’t write. My current solution is to use the [NativeDisableUnsafePtrRestriction] and it seems to be working (it gets past the noise gen line without errors). If anyone has a better idea for this, let me know, but this is not the biggest problem (worst case, I’ll write the noise generating function myself, I don’t need state anyway).

The second hurdle is outputting the points. I’m really breaking my head over this and I seem to be getting nowhere. The (simplified) structure of my job is as follows:

public struct ChunkData {
    public Vector2Int startOffset;
    public float[] noiseData; // <----- What do I put here???
}

public struct GenerateNoiseJob : IJonParallelFor {
    [] public FastNoise noiseGen;
    public NativeArray<ChunkData> chunks;

    public void Execute(int chunkIndex) {
        Vector2Int offset = chunks[chunkIndex].startOffset;
        float[] noiseMap = noiseGen.Gen2D(size, size, offset.x, offset.y) // actual call differs but this is the gist
        for (int i = 0; i < noiseMap.Length; i++) {
            // Do some additional processing....
            chunks[chunkIndex].noiseData[i] = noiseMap[i]
        }
    }
}

For each chunk, I need to generate a noise map and then I need to process that noise map before I send it back to the managed code. Fairly simple. Except that the above code is not allowed, despite it not containing any data that could change. I know this, but the compiler doesn’t and there lies the problem. The noiseData array is of known size and will only contain blittable data types, but any sort of nested collection is disallowed inside that Data struct. On top of that, I’d honestly really like it if the data was not copied at all, but instead had a reference to a pre-allocated chunk of memory in which it could modify the data. I can make sure this stays thread-safe.

So the second question is, with nested arrays disallowed, how can I operate (r+w) on a fixed-sized collection of pure, blittable data for each job in a parallel job? Possible things I’ve found are doing some unsafe things with IntPtrs (I suppose that’s entirely manual data management), or using something like the FixedListFloat32 for points. Disadvantages are that IntPtr is better avoided (but I can avoid copying!), and the FixedList really only supports three data types of different sizes (packing is an option but also more work), so I can’t move a lot of the processing logic to the job if I go that route.

The third hurdle is perhaps the most complicated and might deserve it’s own thread. I’ve not properly researched this yet, but the noise data is being used to load in-world chunks. Chunks don’t need to load all at once in the frame where they are requested; they can arrive staggered at some point in time after they are requested. The IJobParallelFor seems to wait until all parallel jobs are done until it returns. If it needs to generate 13 chunks on 6 workers, and assuming every chunks takes exactly as long, 12 chunks are already done while the 13th is still being generated, but none of the data can be used until that last chunk is done. This is inefficient; I can create a GameObject the second the data for that job is ready. No matter how many other chunks still need to be generated.

I suppose for the last question I’m wondering how to approach this with jobs. IParallelFor seems to be out of the question (single blit back and forth for the entire batch). A plain IJob seems okay from what I understand about the system but I don’t have enough information to know if I can actually leverage that to achieve what I described above. (I don’t know if it actually works that way).

Obviously I can also use plain C# threads. I’m more familiar with the workflow and I think it would boost performance significantly. Most of the issues I mentioned here can be solved fairly trivially with some clever programming. But partly as a learning challenge, and partly because this is the Unity way of doing things and therefore likely the most performant, I’d really like to use the job system. I would think what I’m trying to achieve is the kind of thing the system is designed for, so I assume there are many things I’m unaware of…

Hold your breath. :face_in_clouds:

Waiting for a race condition of sorts. No, honestly, using a C/C++ library from within a burst compiled job is dangerous unless you know for certain that this library code behaves nicely when called from a background thread and doesn’t spawn its own threads nor muck with other threads.

Something like “noise” is so universal I bet you can quickly find a burst/jobs optimized version made specifically for Unity.

You omitted an attribute here?

Write this like so and check if it still compiles:
[BurstCompile] public struct GenerateNoiseJob : IJonParallelFor {

Without Burst compilation you’ll leave most of the performance benefits of using Jobs by the wayside.

Try one of the “unsafe” collections like UnsafeNativeList. The unsafe ones can be nested in other native collections. They are marked “unsafe” because they perform no integrity checks (ie bounds).

You can but the results will be far from exciting. Just a single lock in the wrong place can make a heavily multithreaded code run slower than a single threaded code doing the same thing!

Even if you are able to avoid all these pitfalls of managed threads, bursted jobs will still be a factor of 10 to 100 faster - easily!

So ultimately, your biggest problem here is the use of the FastNoise libary because it will return you a managed array and that’s no good for jobs.

Yeah that’s fair. I’m really short on time and this library is nice (and uses SIMD instructions!), that’s pretty much why I’m using it. It’s simple enough that I can just have the source code open on a second monitor and follow the call chain (no multithreading in the entire project and I can find no obvious state changes). But it definitely isn’t ideal and long-term I am planning to just assemble my own noise functions that directly burst-compile from C#.

Should mention that the noise library doesn’t actually return a managed object, it only copies data to a pointer. I simplified that because I didn’t think it mattered to much, but the actual call looks like this:

var noiseData = new float[size * size];
noise.GenUniformGrid2D(noiseData, chunk.worldLocation.x, chunk.worldLocation.y, size, size, 0.02f, 1337);

Obviously that means noiseData can just be any pointer to an appropriately sized chunk of contiguous allocated memory.

Oop, should be [NativeDisableUnsafePtrRestriction] (will edit)

I think you mean UnsafeList? That’s the one I can find. Some googling into this led me to a thread that mentioned ParallelWriter (link: Unsafe containers), in a very similar context as mine. Looks promising!

But I’m still not entirely sold on this approach. It seems overkill because I know the arrays allocated for each job are unique to that job alone, and no other job or thread is going read or write to it until the job is completed. At this point I’m half considering just passing a pointer to each job, pointing to an allocated array for that chunk, taking some ideas from this article (link: Run managed code in Unity’s Job System – COFFEE BRAIN GAMES), but I’m pretty sure this also prevents burst compilation.

I should think that a thread that generates only new data on its own, and just needs read-only access to a noise generator (worst case I copy it), shouldn’t be running into many race conditions, if at all?

Both UnsafeList<T> and NativeList<T> do bounds checks in the editor (and these are stripped in builds in both cases), the “unsafe” part refers to the native container safety system. But I agree that nested UnsafeList<T> is probably the way to go.

If you want to have a single managed array and use that memory everywhere without copying, you can pin the array and get the pointer to its memory. Maybe wrap it in an UnsafeList if you want to avoid dealing with pointers. You need to be careful to unpin the array when you’re done using it, so it’s not that much different from manual memory management.

T[] array = ...;
void* ptr = UnsafeUtility.PinGCArrayAndGetDataAddress(array, out var handle);
var unsafeList = new UnsafeList<T>((T*)ptr, array.Length);
// ...
UnsafeUtility.ReleaseGCObject(handle);

This lets you use Burst and jobs with a managed array, but it might be easier to just use native containers everywhere.

The job system isn’t great for async execution of long-running tasks. It’s usually more about scheduling fast jobs that complete later in the same frame. Note that scheduling too many long-running jobs will occupy workers that are needed by other game systems, which will tank your framerate. You can maybe work around this with careful scheduling and processing of smaller workloads.

Guess what the Burst compiler does to your code? :wink:

Yes, it generates vectorized SIMD instructions whenever it can. When it can’t optimize that’s part of your optimization process whenever possible. I’ve seen the same code (functional wise) move from being five times faster to being over 200 times faster than the singlethreaded managed version. The Burst debugger is quite helpful just to see the change from scalar to vectorized assembly instructions.

That’s good because with unsafe keyword you should be able to iterate over this chunk of memory even inside a job. I believe you can even initialize a NativeArray with a Ptr. Also check out UnsafeUtility and the Collections package has extensions to that.

Thanks for mentioning this, this was not something I would’ve thought to think about but it makes a lot of sense. All things considered I think I’ll just make the noise generation function itself jobified instead of the entire chunk. That is the meat and bones of it, after all. I could use coroutines or C# threads to handle the asynchronous, different-frame loading of the entire chunks after the noise data has been generated. (Although I’m not yet super sure where to call jobHandle.Complete() in this approach.)

Super interesting snippet here, looks very promising. Although, as you say, I think it might just be easier to use a nested UnsafeList<T> directly without messing with memory management, and take the allocation/copy cost for what it is (can’t be anything too wild). If I just use jobs to generate single points of noise I don’t even think I’ll need nested collections.

Yeah I’ve noticed this. I made a quick attempt at trying to burst compile the FastNoiseLite library (the native C# one, not the dll plugin) inside a job and wouldn’t you know it, this works right away. I’m not sure if the performance gain is as good as the FastNoise2 dll, but honestly, it’s bound to be better than the lite C# version and that was more than fine already.