Is NativeArray slower than regular array? (2018.1.0b6)

I was testing the new job system and noticed that calculations with NativeArrays are running slower than the same calculations with regular arrays. For tests I used three cases for velocity calculations:

1 - jobified IJobParallelFor
2 - the same as 1 but running single-threaded with positions and velocities stored in NativeArray
3 - the same as 2 but with regular arrays instead of native arrays.

The results on my computer for 100k points are:
1 - 60 FPS; 2 - 34 FPS; 3 - 79 FPS.

This shows that single-threaded calculations with regular arrays are about 2.3x faster than the same calculations using NativeArray. IJobParallelFor speeds up calculations by 1.8x compared to the single-threaded NativeArray usage. However, the IJobParallelFor speedup seems to be not enough to reach FPS when using regular arrays.

Do I miss something here?

The full code is bellow:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.Jobs;
using Unity.Collections;

public class JobDemo1 : MonoBehaviour {

    struct VelocityJob1 : IJobParallelFor
    {
        [ReadOnly]
        public NativeArray<Vector3> velocity;
        public NativeArray<Vector3> position;
        public float deltaTime;
        public void Execute(int i)
        {
            position[i] = position[i] + velocity[i] * deltaTime;
        }
    }
   
    struct VelocityJob2
    {
        [ReadOnly]
        public NativeArray<Vector3> velocity;
        public NativeArray<Vector3> position;
        public float deltaTime;
        public void Execute(int i)
        {
            position[i] = position[i] + velocity[i] * deltaTime;
        }
    }
   
    struct VelocityJob3
    {
        [ReadOnly]
        public Vector3[] velocity;
        public Vector3[] position;
        public float deltaTime;
        public void Execute(int i)
        {
            position[i] = position[i] + velocity[i] * deltaTime;
        }
    }
   
     NativeArray<Vector3> position1;
     NativeArray<Vector3> velocity1;
    
     Vector3[] position3;
     Vector3[] velocity3;
    
     void Start(){
         int n = 100000;
        
         position1 = new NativeArray<Vector3>(n, Allocator.Persistent);
         velocity1 = new NativeArray<Vector3>(n, Allocator.Persistent);
        
         position3 = new Vector3[n];
         velocity3 = new Vector3[n];
        
         for (var i = 0; i < velocity1.Length; i++){
            velocity1[i] = new Vector3(0, 10, 0);
            velocity3[i] = new Vector3(0, 10, 0);
        }
     }
    
    
     int jobMode = 1;
     void Update(){
         if(jobMode == 1){
             Update1();
         }
         else if(jobMode == 2){
             Update2();
         }
         else if(jobMode == 3){
             Update3();
         }
        
        
         if(Input.GetKeyDown(KeyCode.A)){
             jobMode = 1;
             Debug.Log("jobMode "+jobMode);
         }
         else if(Input.GetKeyDown(KeyCode.B)){
             jobMode = 2;
             Debug.Log("jobMode "+jobMode);
         }
         else if(Input.GetKeyDown(KeyCode.C)){
             jobMode = 3;
             Debug.Log("jobMode "+jobMode);
         }
     }
    
    void Update1(){
        int processorCount = System.Environment.ProcessorCount;

        var job = new VelocityJob1()
        {
            deltaTime = Time.deltaTime,
            position = position1,
            velocity = velocity1
        };
        JobHandle jobHandle = job.Schedule(position1.Length, processorCount);
        jobHandle.Complete();
    }
   
    void Update2(){
        var job = new VelocityJob2(){
            deltaTime = Time.deltaTime,
            position = position1,
            velocity = velocity1
        };
               
        for(int i=0; i<position1.Length; i++){
            job.Execute(i);
        }
    }
   
    void Update3(){
        var job = new VelocityJob3(){
            deltaTime = Time.deltaTime,
            position = position3,
            velocity = velocity3
        };
       
        for(int i=0; i<position3.Length; i++){
            job.Execute(i);
        }
    }
   
    void OnApplicationQuit(){
        position1.Dispose();
        velocity1.Dispose();
    }
   
}
1 Like

Any code compiled with the ENABLE_UNITY_COLLECTIONS_CHECKS define has added checks to make sure that you aren’t accessing the array from different jobs at the same time in a way that might be unsafe. This is in addition to the bounds checking that both native and managed arrays do. This is on by default in the Editor.

I haven’t checked, but I think said safety checks (unsure about bounds checking) are disabled in non-development builds. I’ve personally noted a fairly large speedup when making said builds.

Ok, you seem to be right. I just build the project with no Development Build mode enabled and here are the results:
1 - 295 FPS
2 - 156 FPS
3 - 169 FPS

So when build, jobified part is the fastest one. NativeArray is still slightly slower than regular array but I can live with that :slight_smile: . I am curious isn’t there possible to disable ENABLE_UNITY_COLLECTIONS_CHECKS while in Editor? Couldn’t find it so far. Though it would be a bit weird that in order to check performance each time would be needed to build the project.

[quote=“chanfort, post:3, topic: 692216, username:chanfort”]
Though it would be a bit weird that in order to check performance each time would be needed to build the project.
[/quote]We always recommend measuring performance in an actual build, not in the Editor; in the Editor there’s a lot of added overhead that will distort your results, both from things like safety checks and also just from things like needing to render all the Editor windows as well as just the game view.

1 Like

First rule of performance test: always test on the device where you will run it, in the environment, you will run it.
The editor isn’t one.

On the other hand, these synthetic tests are mostly useless. Maybe in your empty memory the managed array was slightly faster than the native array, but in an application, where you’re constantly allocating and disposing things in and from the heap, it quickly becomes more slower. Unless you allocate everything when you start up your application. Which may or may not be feasible for your use case.

I stopped caring about these synthetic tests a long time ago and just investigate my application instead. Whatever works for you in your environment at fit in your style, choose that one. On the other hand if you choose native array, you may get more stable performance. And the possibility to use the same data structure in single and multi-threaded jobs.

Check out this post please for a lot of details regarding performance & how / what / where to profile:

1 Like

Thanks!

Thanks Joachim for the link, now it’s more clear on what’s going on.

In a mean time I managed to jobify kdtree neighbour search (1.9x speedup) what shows that even complex recursive trees can be jobified. I think that kdtree is the fastest of neighbour search methods and the new job system with upcoming Burst compiler looks very promising.

1 Like