Burst Compatible LINQ extensions

Congratulations on implementing a large part of the Linq API surface! It’s way more work than it appears to be. Coincidentally, this is very similar to a project I’ve been trying to finish for the last couple of months, and since I’m nowhere near being done, I can at least share some of my experience. Currently I don’t have most of the operators implemented, as I’m still figuring out the API and the post-processor.

Cecil is hell, but definitely workable. My post-processor “mostly works”, albeit there’s a never-ending supply of edge cases to consider. I’m also concerned with performance in larger projects, but it’s fast enough for now. I know DOTS is moving to C# Source Generators, but I’m not sure if it’s going to be of any help here. Unity moved the codegen post-processors to a separate executable in 2020.2 which also throws a spanner in the works a little when it comes to debugging - I’m sticking to 2020.1 for now.

Below is a short example of my pipeline. Start with a job:

[BurstCompile]
struct Job1 : IJob
{
    public NativeArray<float3> Buffer;
    public NativeArray<float3> Output;
    public float MinLength;

    public void Execute()
    {
        float minLengthSq = MinLength * MinLength;

        Buffer
            .NativeSequence()
            .Where(x => math.lengthsq(x) > minLengthSq)
            .OrderBy(x => x.y)
            .CopyTo(ref Output)
            .Dispose();
    }
}

The NativeSequence() extension creates a struct of type NativeSequenceContext which is the universal starting point for all operations. I didn’t bother with creating per-operator extension methods for NativeArray because I wanted to provide uniform support for all possible arraylike collections (ones where you can get a pointer to the data) and to avoid mixing my stuff up with Linq (NativeSequenceContext isn’t an Enumerable, but NativeArray is, for some reason). I also decided I wouldn’t be implementing lazy enumeration.

An extension that takes a delegate looks like this:

[NativeLambdaMethod]
public static NativeSequenceContext<T> Where<T>(this NativeSequenceContext<T> context, Func<T, bool> predicate) where T : struct

Based on this, a new burst-compatible method is emitted in the first post-processing step. (I might write these by hand eventually to reduce the amount of codegen and eliminate the need of post-processing the assembly containing the extension methods.)

[NativeLambdaMethod]
public static NativeSequenceContext<T> Where_CodeGen0<T, TLambda0>(this NativeSequenceContext<T> context, ref TLambda0 lambda0) where T : struct where TLambda0 : IFunctor_CodeGen_Where0<T>

As you can tell, I replace the delegate with a generic parameter constrained to an interface emitted based on the signature of the delegate’s Invoke method.

The second post-processing step patches the usages of the extension methods. Our job now looks like this:

public void Execute()
{
    var lambda = new <>c__DisplayClass3_0();
    lambda.minLengthSq = MinLength * MinLength;

    var context = Buffer
        .NativeSequence()
        .Where_CodeGen0(ref lambda);
 
    var lambda2 = default(<>c.LambdaStruct_CodeGen_<Execute>b__3_10);

    context
        .OrderBy_CodeGen0(in lambda2)
        .CopyTo(ref Output)
        .Dispose();
}

The delegates were converted to a struct implementing the previously-emitted IFunctor interface. In case of lambdas that contain closures Roslyn emits a display class, so we convert the display class to a struct and make it implement the interface instead. The ECS ForEach post-processor deals with similar problems and was a very useful reference here.

I have also started experimenting with job-level operation scheduling - this allows for each operation to be implemented as an appropriate job type (eg. allowing for parallel selection and sorting). Getting this to work will probably require me to re-implement all of the operators separately, though. The API looks like this:

buffer
    .NativeSequenceJob()
    .Where(x => x > 0)
    .Select(math.sqrt)
    .ToList(out var list, Allocator.Temp)
    .Schedule()
    .Complete();

In SystemBase you’d probably use it like this:

Dependency = buffer
    .NativeSequenceJob(Dependency)
    .Where(x => x > 0)
    .Select(math.sqrt)
    .ToList(out var list, Allocator.Temp)
    .Schedule();

I think overall using delegates this way is an INCREDIBLY powerful technique and it results in very efficient assembly (pretty close to what you’d write by hand, all of the wrappers are optimized away). I wish Burst supported this natively.

Additionally, I have found that there are unexpected performance benefits of structuring code this way. Splitting data processing into multiple small operations (as opposed to doing it in one large loop) often results in better vectorization, less branch prediction issues, etc.

For example, a query starting with .Where(…) deals with all branching up front, and the subsequent .Select(…) operations will be much easier to optimize by the compiler. It’s not unusual for these short Linq-like queries to execute 5x faster than a naive hand-written loop (all on single thread).

5 Likes