Using local scopes in SystemBase.OnUpdate can cause captured variables to silently get wrong values

I just found and reported this very evil codegen bug. It was a debugging nightmare to work out, so I’m posting here so others can avoid it.

The short version: Don’t use local scopes in SystemBase.OnUpdate! They break things subtly and weirdly.

I’ve found that introducing local scopes in OnUpdate can cause captured variables to take on incorrect values, with no warning that anything has gone wrong.

I’ve only found one specific way to trigger it so far, but I’d suggest that we consider all local scopes to be potentially broken until we hear about a fix, because when this breaks it’s very hard to notice and very hard to track down.

Here’s the reproduction that explains the bug.

class ReproSystem : SystemBase
{
    protected override void OnUpdate()
    {
        var foo = 0f;

        // WEIRD SCOPE!
        // Comment out this pair of braces and suddenly logging output is correct!
        {
            // Only happens when there's a dependency on something inside the scope.
            var bar = 1f;

            this.Job.WithCode(() =>
            {
                foo = bar;
            }).Run();
        }

        // SHOULD LOG 1
        // When braces are present, logs 0
        // When braces are not present, logs 1
        UnityEngine.Debug.Log(foo);

        // A second job that consumes the relevant variable seems to be necessary to trigger the bug.
        this.Job.WithCode(() => {
            foo = 5;
        }).Run();
    }
}
2 Likes

Case number is 1318432.

Don’t anyone dare comment on the fact that I’m still using Windows 7.

Phew. I need some coffee after this one!

Seems like a side-effect of C# rather than codegen.

Since there’s two scopes now (due to second .WithCode lambda), most likely temporary variable is created instead for the first block. To which value from foo is copied. But since its a value type - it gets discarded after exiting first local scope.

This is pure speculation ofc, I haven’t investigated what actually codegen would genered, but this is what common sense tells me.

Why use local scopes anyway in OnUpdate? It makes everything look even more horrible.
Like there isn’t other sources of horrible extra tab spaces that obscure readability already.

If you need to split code, just use methods (static methods are supported in codegen scopes).

I’m familiar with C#'s behaviours in this area and I’m 99% sure that this is not expected C# behaviour.

We can show this by making an equivalent pure C# version.

private static void RunLambda(LambdaSingleJobDescriptionConstructionMethods.WithCodeAction lambda)
{
    lambda();
}

private static void Test()
{
    var foo = 0f;

    {
        var bar = 1f;
        RunLambda(() =>
        {
            foo = bar;
        });
    }

    UnityEngine.Debug.Log(foo);

    RunLambda(() => {
        foo = 5;
    });
}

This runs as expected without this broken behaviour.

Actually though, I first tried to create that RunLambda version inside OnUpdate, and have now found a slightly different version of the bug where it gives an actual compiler error!

class ReproSystem : SystemBase
{

    // Gives compiler error
    // (0,0): error error DCICE006: Next instruction used in constructing the DisplayClass needs to be a store local instruction in OnUpdate
    // Seeing this error indicates a bug in the dots compiler.
    // We'd appreciate a bug report (About->Report a Bug...). Thnx! <3
    protected override void OnUpdate()
    {
        var foo = 0f;

        // WEIRD SCOPE!
        // Comment out this pair of braces and suddenly the compiler error goes!
        {
            // Only happens when there's a dependency on something inside the scope.
            var bar = 1f;

            RunLambda(() =>
            {
                foo = bar;
            });
        }

        // A second job that consumes the relevant variable seems to be necessary to trigger the bug.
        this.Job.WithoutBurst().WithCode(() =>
        {
            foo = 5;
        }).Run();
    }

    private static void RunLambda(LambdaSingleJobDescriptionConstructionMethods.WithCodeAction lambda)
    {
        lambda();
    }
}
}

Again, removing the braces fixes it. That should remove any doubt that this is a Burst/Entities bug.

I’ll be filing this version as a report too.

2 Likes

I’ve had the same type of problem and had to remove a Foreach and to it “manually” to get it to work. I appreciate your bug report!

This is not equivalent at all :slight_smile: Your initial code wouldn’t look like this at all, it will look absolutely different (see below).

#define ENABLE_PROFILER
using System.Runtime.CompilerServices;
using Unity.Burst;
using Unity.Entities;
using Unity.Jobs;
using Unity.Jobs.LowLevel.Unsafe;
using Unity.Profiling;
using UnityEngine;

namespace Systems
{
    internal class ReproSystem : SystemBase
    {
        [BurstCompile]
        [NoAlias]
        [Unity.Entities.DOTSCompilerGenerated]
        private struct <>c__DisplayClass_OnUpdate_LambdaJob0 : IJob
        {
            public float bar;

            public float foo;

            private static InternalCompilerInterface.JobRunWithoutJobSystemDelegate s_RunWithoutJobSystemDelegateFieldNoBurst;

            private static InternalCompilerInterface.JobRunWithoutJobSystemDelegate s_RunWithoutJobSystemDelegateFieldBurst;

            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            internal void OriginalLambdaBody()
            {
                foo = bar;
            }

            public void ReadFromDisplayClass(ref <>c__DisplayClass0_1 displayClass)
            {
                bar = displayClass.bar;
                foo = displayClass.CS$<>8__locals1.foo;
            }

            public void WriteToDisplayClass(ref <>c__DisplayClass0_1 displayClass)
            {
                displayClass.bar = bar;
                displayClass.CS$<>8__locals1.foo = foo;
            }

            public void Execute()
            {
                OriginalLambdaBody();
            }

            public void ScheduleTimeInitialize(ReproSystem componentSystem, ref <>c__DisplayClass0_1 displayClass)
            {
                ReadFromDisplayClass(ref displayClass);
            }

            [BurstCompile]
            [Unity.Entities.MonoPInvokeCallback(typeof(InternalCompilerInterface.JobRunWithoutJobSystemDelegate))]
            public unsafe static void RunWithoutJobSystem(void* jobData)
            {
                ((<>c__DisplayClass_OnUpdate_LambdaJob0*)jobData)->Execute();
            }
        }

        [Unity.Entities.DOTSCompilerGenerated]
        [BurstCompile]
        [NoAlias]
        private struct <>c__DisplayClass_OnUpdate_LambdaJob1 : IJob
        {
            public float foo;

            private static InternalCompilerInterface.JobRunWithoutJobSystemDelegate s_RunWithoutJobSystemDelegateFieldNoBurst;

            private static InternalCompilerInterface.JobRunWithoutJobSystemDelegate s_RunWithoutJobSystemDelegateFieldBurst;

            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            internal void OriginalLambdaBody()
            {
                foo = 5f;
            }

            public void ReadFromDisplayClass(ref <>c__DisplayClass0_0 displayClass)
            {
                foo = displayClass.foo;
            }

            public void WriteToDisplayClass(ref <>c__DisplayClass0_0 displayClass)
            {
                displayClass.foo = foo;
            }

            public void Execute()
            {
                OriginalLambdaBody();
            }

            public void ScheduleTimeInitialize(ReproSystem componentSystem, ref <>c__DisplayClass0_0 displayClass)
            {
                ReadFromDisplayClass(ref displayClass);
            }

            [BurstCompile]
            [Unity.Entities.MonoPInvokeCallback(typeof(InternalCompilerInterface.JobRunWithoutJobSystemDelegate))]
            public unsafe static void RunWithoutJobSystem(void* jobData)
            {
                ((<>c__DisplayClass_OnUpdate_LambdaJob1*)jobData)->Execute();
            }
        }

        private ProfilerMarker <>OnUpdate_LambdaJob0_profilerMarker;

        private ProfilerMarker <>OnUpdate_LambdaJob1_profilerMarker;

        protected unsafe override void OnUpdate()
        {
            <>c__DisplayClass0_0 displayClass = default(<>c__DisplayClass0_0);
            displayClass.foo = 0f;
            <>c__DisplayClass0_1 displayClass2 = default(<>c__DisplayClass0_1);
            displayClass2.CS$<>8__locals1 = displayClass;
            displayClass2.bar = 1f;
            _ = base.Job;
            <>c__DisplayClass_OnUpdate_LambdaJob0 jobData = default(<>c__DisplayClass_OnUpdate_LambdaJob0);
            jobData.ScheduleTimeInitialize(this, ref displayClass2);
            CompleteDependency();
            InternalCompilerInterface.JobRunWithoutJobSystemDelegate functionPointer = JobsUtility.JobCompilerEnabled ? <>c__DisplayClass_OnUpdate_LambdaJob0.s_RunWithoutJobSystemDelegateFieldBurst : <>c__DisplayClass_OnUpdate_LambdaJob0.s_RunWithoutJobSystemDelegateFieldNoBurst;
            <>OnUpdate_LambdaJob0_profilerMarker.Begin();
            try
            {
                InternalCompilerInterface.RunIJob(ref jobData, functionPointer);
            }
            finally
            {
                <>OnUpdate_LambdaJob0_profilerMarker.End();
            }
            jobData.WriteToDisplayClass(ref displayClass2);
            Debug.Log(displayClass.foo);
            _ = base.Job;
            <>c__DisplayClass_OnUpdate_LambdaJob1 jobData2 = default(<>c__DisplayClass_OnUpdate_LambdaJob1);
            jobData2.ScheduleTimeInitialize(this, ref displayClass);
            CompleteDependency();
            InternalCompilerInterface.JobRunWithoutJobSystemDelegate functionPointer2 = JobsUtility.JobCompilerEnabled ? <>c__DisplayClass_OnUpdate_LambdaJob1.s_RunWithoutJobSystemDelegateFieldBurst : <>c__DisplayClass_OnUpdate_LambdaJob1.s_RunWithoutJobSystemDelegateFieldNoBurst;
            <>OnUpdate_LambdaJob1_profilerMarker.Begin();
            try
            {
                InternalCompilerInterface.RunIJob(ref jobData2, functionPointer2);
            }
            finally
            {
                <>OnUpdate_LambdaJob1_profilerMarker.End();
            }
            jobData2.WriteToDisplayClass(ref displayClass);
        }

        protected internal unsafe override void OnCreateForCompiler()
        {
            base.OnCreateForCompiler();
            <>c__DisplayClass_OnUpdate_LambdaJob0.s_RunWithoutJobSystemDelegateFieldNoBurst = <>c__DisplayClass_OnUpdate_LambdaJob0.RunWithoutJobSystem;
            <>c__DisplayClass_OnUpdate_LambdaJob0.s_RunWithoutJobSystemDelegateFieldBurst = InternalCompilerInterface.BurstCompile(<>c__DisplayClass_OnUpdate_LambdaJob0.s_RunWithoutJobSystemDelegateFieldNoBurst);
            <>OnUpdate_LambdaJob0_profilerMarker = new ProfilerMarker("OnUpdate_LambdaJob0");
            <>c__DisplayClass_OnUpdate_LambdaJob1.s_RunWithoutJobSystemDelegateFieldNoBurst = <>c__DisplayClass_OnUpdate_LambdaJob1.RunWithoutJobSystem;
            <>c__DisplayClass_OnUpdate_LambdaJob1.s_RunWithoutJobSystemDelegateFieldBurst = InternalCompilerInterface.BurstCompile(<>c__DisplayClass_OnUpdate_LambdaJob1.s_RunWithoutJobSystemDelegateFieldNoBurst);
            <>OnUpdate_LambdaJob1_profilerMarker = new ProfilerMarker("OnUpdate_LambdaJob1");
        }
    }
}

Unity using Roslyn’s compiler DisplayClass approach and has issues with scopes etc in Unity’s modification. Unity improving this and moving forward from display classes (as far as I know from chat with them about this topic last time in 2020). Issue here is that Roslyn generates two display classes (0_0 and 0_1) for this foo and bar and they both in a different one and both Job.WithCode writes to different DisplayClass and in case of second display class it writes foo into own DisplayClass.CS$<>8__locals1 and you might think “Hey why it’s not by reference its called Class” but it’s not. Both DisplayClass0_0 and DisplayClass0_1 are structs, and now you see that foo in DisplayClass.CS$<>8__locals1 it’s a copy, not reference and despite DisplayClasss-es itself passed with ref, CS$<>8__locals1 field not reference to DisplayClass0_0 but copy and Job changing this value of the foo to bar in DisplayClass0_1, but it wouldn’t affect original DisplayClass0_0 from which you getting foo for Debug.Log.

And the exact equivalent of compiled by Roslyn code in C# will work exactly the same:

public struct TestDisplay1 //Value type
    {
        public float foo;
    }
 
    public struct TestDisplay2 //Value type
    {
        public float bar;
        public TestDisplay1 locals; //Value type field
    }

    public class ReproSystem : SystemBase
    {
        private static void RunLambda(LambdaSingleJobDescriptionConstructionMethods.WithCodeAction lambda)
        {
            lambda();
        }
        protected override void OnUpdate()
        {
            var displayClass1 = default(TestDisplay1);
            displayClass1.foo = 0f;
      
            var displayClass2 = default(TestDisplay2);
            displayClass2.locals = displayClass1; //Take a close look at this especially on declaration
            displayClass2.bar = 1f;

            RunLambda(() =>
            {
                displayClass2.locals.foo = displayClass2.bar; //And this
            });
      
            Debug.Log(displayClass1.foo);
        }
    }

6892841--806192--upload_2021-3-2_12-21-27.png

Compiled code itself works as expected, issue here is that Roslyn generates bad code, but as I already told, Unity should be aware of these issues with DisplayClass. As we discussed these locals issues 9 month’s ago with Joel.
6892841--806189--upload_2021-3-2_12-17-56.png

6892841--806186--upload_2021-3-2_12-16-9.png

6 Likes

Thanks for taking the time to do some digging on this @eizenhorn !

Ah, I think we’ve had a bit of a miscommunication here somewhere. I thought that @VergilUa was saying that this would happen in normal C# without any of the DOTS rewriting that’s happening, so that’s what I was trying to disprove. What you’re showing is how it happens in the rewritten output. When I described this as a codegen bug, I was assuming that this was the type of thing that was happening - it was rewriting into something with incorrect behaviour.

I don’t know why I didn’t think to look at the decompiled output last night! The difference is quite clear.

Without braces:

public void ReadFromDisplayClass(ref <>c__DisplayClass0_0 displayClass)
{
    foo = displayClass.foo;
}

public void WriteToDisplayClass(ref <>c__DisplayClass0_0 displayClass)
{
    displayClass.foo = foo;
}
displayClass.bar = 1f;

With braces:

public void ReadFromDisplayClass(ref <>c__DisplayClass0_1 displayClass)
{
    bar = displayClass.bar;
    foo = displayClass.CS$<>8__locals1.foo;
}

public void WriteToDisplayClass(ref <>c__DisplayClass0_1 displayClass)
{
    displayClass.bar = bar;
    displayClass.CS$<>8__locals1.foo = foo;
}
<>c__DisplayClass0_1 displayClass2 = default(<>c__DisplayClass0_1);
displayClass2.CS$<>8__locals1 = displayClass;
displayClass2.bar = 1f;

So yes, this explains the different behaviour quite clearly. I’m having a bit of a dig through the source, although I don’t think I can justify putting too much time into this with no tangible outcome.

Unity.Entities.CodeGen/LambdaJobs is the place to look into how all of this happens, and the EntitiesForEachJobCreator.cs and LambdaJobsPostProcess.cs files are the main ones responsible for the relevant parts. They actually have pretty good comments that document the whole process.

I believe that I understand why this breaks in Unity’s version and not in plain C#!

In standard Roslyn output, these DisplayClass types are classes. Nested scopes reference outer scopes by reference. Unity does a transformation that turns these DisplayClasses intro structs, so the references between them are now by value. This means that changes to captured scopes don’t propagate back out to their parents as they should.

If you look at this Roslyn output and imagine just changing the generated DisplayClasses to structs, it’s easy to see how it would break: https://sharplab.io/#v2:CYLg1APgAgTAjAWAFBQMwAJboMLoN7LpGYZQAs6AsgBQCU6hxBSxr6AlgHYAu6AxgEMADgEF0AXnRwA3Izb5Wc+cS69BQgEIT0MWS2WL9BogBEA9gBUAFlwDm1OuIB8zY8vVjJcdGH7CNem5EAL60gcrBSoaskUYMcWiYFObWdtRQcDDo3DactrRKrso5aWFKscFAA==

I’ve only had a cursory think about this, but I think that this is actually a bug in Unity’s transformations and can be fixed! The transformed DisplayClasses just need one further transformation - to reference the locals scope by pointer.

There may well be a reason why that’s not a viable transformation, but they don’t seem to reference this issue at all in their otherwise extensive documentation comments, so it appears that it may be an oversight.

I just realised that you already worked out and said pretty much everything that I worked out. Sorry! I had to go through it myself to fully get my head around it.

There’s probably some obvious reason why we can’t transform the references between the display structs into pointers, but I haven’t worked out what it is yet.