NativeList/Array vs C# List editor performance

Hi,

I am working on optimising part of our code where we have a large loop looking to eventually burst compile some parts of it (sadly I cannot burst the entire loop as there are places that unavoidably reference managed classes), and as a first step towards this I have converted the data class we were using to a struct and the Lists to NativeLists. This of course isn’t the primary optimisation I was performing but I was expecting that not having to dereference all the classes in the List would lead to an improvement in performance, but instead in editor at least the performance is notably worse (~4x slowdown).

To investigate what was happening I have added some basic profiling around different data structures, and it seems that on device using the NativeArray is very noticably faster than the other options I tried (List and NativeList), but in editor both NativeList and NativeArray are notably slower than List.


A comparison of editor vs device performance. In editor both Native containers are comparable and between 4 and 5 times slower. On device the NativeList is around 2 times slower to add and 4 times faster to access, whereas the NativeArray is 10 times faster to add and 20 times faster to access.

For adding elements the NativeList is slower in both cases, although by more in editor, but the NativeArray is very different on device performance vs editor performance.

Is there anything that can be done to get the editor performance to more closely match the device? Due to how frequently the code is being called the noticable slowdown the NativeArray causes in editor has a drastic effect on our editor performance, but its so much faster on device that we would of course prefer to use this method.

Here is the code I am using for the simple profiling test

        private List<int> _list;
        private NativeList<int> _nativeList;
        private NativeArray<int> _nativeArray;
        
        private void PostManagersInit()
        {
            int iterations = 1000000;

            // Benchmark C# List<T>
            _list = new List<int>(iterations);
            var startTime = Time.realtimeSinceStartup;
            for (int i = 0; i < iterations; i++)
            {
                _list.Add(i);
            }
            var endTime = Time.realtimeSinceStartup;
            Dbg.Log(LogType.Manager, $"C# List<T> Add:     {endTime - startTime} seconds");

            // Benchmark NativeList<T>
            _nativeList = new NativeList<int>(iterations, Allocator.Temp);
            startTime = Time.realtimeSinceStartup;
            for (int i = 0; i < iterations; i++)
            {
                _nativeList.Add(i);
            }
            endTime = Time.realtimeSinceStartup;
            Dbg.Log(LogType.Manager, $"NativeList<T> Add:  {endTime - startTime} seconds");
            
            // Benchmark NativeArray<T>
            _nativeArray = new NativeArray<int>(iterations, Allocator.Temp);
            startTime = Time.realtimeSinceStartup;
            for (int i = 0; i < iterations; i++)
            {
                _nativeArray[i] = i;
            }
            endTime = Time.realtimeSinceStartup;
            Dbg.Log(LogType.Manager, $"NativeArray<T> Add: {endTime - startTime} seconds");
            
            Dbg.Log(LogType.Manager, $"--------------------------------------------------");
            
            // Benchmark C# List<T>
            startTime = Time.realtimeSinceStartup;
            for (int i = 0; i < iterations; i++)
            {
                _list[i] = i + 1;
            }
            endTime = Time.realtimeSinceStartup;
            Dbg.Log(LogType.Manager, $"C# List<T> Update:     {endTime - startTime} seconds");
            
            // Benchmark NativeList<T>
            startTime = Time.realtimeSinceStartup;
            for (int i = 0; i < iterations; i++)
            {
                _nativeList[i] = i + 1;
            }
            endTime = Time.realtimeSinceStartup;
            Dbg.Log(LogType.Manager, $"NativeList<T> Update:  {endTime - startTime} seconds");
            
            // Benchmark NativeArray<T>
            startTime = Time.realtimeSinceStartup;
            for (int i = 0; i < iterations; i++)
            {
                _nativeArray[i] = i + 1;
            }
            endTime = Time.realtimeSinceStartup;
            Dbg.Log(LogType.Manager, $"NativeArray<T> Update: {endTime - startTime} seconds");

            _nativeList.Dispose();
        }

AFAIK Native collections do generally perform worse in editor/regular script code etc, it’s in burst they shine a bit more. Possibly this is party due to all the extra safety checks. I don’t think they can be disabled outside of bursted code (and production builds?)

Also, something to consider about NativeList, if you know the capacity up front, you can use AddNoResize* instead of Add to skip the capacity check. (As an aside, AddNoResize should really have been named AddNoAlloc or something)

Also, if you’re working with unmanaged data you don’t necessarily need to switch from List to NativeList (as least not as long as you’re not planning on adding to the list while you’re processing it.) See [Open Source] ViewAsNativeArray utility, use lists and arrays in jobs

Without Burst and in the editor, the safety checks make the native containers slower. However, if this is actually your editor bottleneck (and not whatever you are fetching the data from), then there’s probably a way to refactor your code to use Burst right away. But I would need to see your real code to help you with that. Your little benchmark can be trivially made Burst-compiled.

Hello @ForthstarAdmin

Just looked over your post together with @tim_jones from my team, thanks for writing @ForthstarAdmin! :slight_smile: (and everyone else for helping)

Why your benchmark indicates slow Editor performance:
The benchmark code that you shared here isn’t Burst compiled (which can be achieved via the [BurstCompile] attribute). Because it’s not being Burst compiled this means that the NativeArray and NativeList benchmarks aren’t showing how they’d perform with Burst. Rather your benchmark code is being run in Mono (which is slow) in the Editor, and on IL2CPP in the Android Build.

The reason your benchmarks are showing faster performance in the Android build (compared to the Editor) is because IL2CPP is optimizing your code better than Mono. However, Burst should be able to optimize your code quite a bit better than both of them.

Our suggestion:
Our suggestion is to refactor the benchmark code to have the native collection parts inside a [BurstCompile] entry point, so that your the code you’re bench-marking is compiled with Burst. This would result in your benchmark showing how NativeArray / NativeList performs Bursted (and should show an improvement in performance both in the Android build and in the Editor!) :slight_smile:

Hope this helps! Feel free to share how it goes

1 Like