A couple general thoughts on what to expect:
- In the editor NativeArray vs builtin arrays have debugging overhead:
- We detect race conditions
- in IJobParallelFor we detect writing to wrong ranges of indices
-
Mono JIT itself has dedicated instructions for array access, in mono we can’t get the exact speed as array lookups. However IL2CPP we are on par/better comparing NativeArray vs builtin arrays. We expect that these days most of our users use il2cpp for the final deployed game for the best performance. So please measure with IL2CPP in standalone player. (Also see note below for latest build with some optimizations that will make it into 18.1)
-
The Job scheduler in unity is significantly less overhead. Best way to measure is to schedule a bunch of empty jobs. Again editor has quite a bit of overhead due to race condition detection. So its important to measure in standalone player. There are two important things to measure
- GC allocations caused by scheduling a job. Our view is that keeping it to zero is critical to avoid GC collections later on. We do that, ParallelTasks very much does not
- Cost to actually schedule + execute
- Cost of actually running in harmony with other engine threads. (Reducing context switch cost) Unity Job system uses the same job system as engine code allowing for greater integration & no context switch cost
- Ultimately neither mono nor IL2CPP performance really matters. The compiler we expect all users to use for C# jobs is Burst. This will NOT be available in 18.1 but likely in 18.2. Burst itself does not know what builtin array is. Essentially burst is a compiler dedicated to the problem of making C# jobs and a specific subset of C# to get the absolute best performance you could hope for. For this reason we make the assumption that there are exactly no GC types in the type of code that burst executes. Hence everything is Native containers + structs. This is a part of what enables the 5x-10x speedups we are usually seeing in burst vs mono/il2cpp. Also we generally beat C++ performance by good margins with Burst already.
You probably want to watch this for a more complete overview of what we are aiming at:
It would be great if you can share the specific benchmark you made so we can take a look.
Note on 2). These il2cpp optimizations are not yet in the just beta. Here is a build from a branch that will soon make it into the official beta builds so you can do the benchmark tests today:
https://beta.unity3d.com/download/966b48dc5f14/public_download.html
(Build has not gone through QA, so i dont recommend using it beyond benchmarking)