Vector3 addition is inefficient

Hi. I have a simple Vector3 addition where the result is the addition of 2 vectors.
This takes 84.73 ms for 98304 executions.

However, if I add the two vectors component by component, I get:
10.43 ms for 98304 executions.
That’s a huge improvement. Question is why?

I also tried writting my own addition function, with “ref” or without ref.
Result is 33 ms for 98304 iterations.

Edit: I actually needed to add 3 coordinates to a vector. Removing one of the vectors and adding the 3 coordinates one by one leads to an amazing result:
4.3 ms for 98304 executions

That’s an almost 20x improvement! It is noticeable, especially if this function is called way more often than 100k times.

All these figures are obtained with deep profiler on.
Thoughts?

Kind regards,

Perhaps somebody from Unity team can guide a little in here? I would like to know how to structure my code around this.

Kind regards,

is it same in the build also?

*this used to be the old tip (avoid vector operations), probably still valid:

2 Likes

Can you also share the two different portions of test code?

If I do this:

public static void Vector3AdditionTest()
{
	var stopwatch = Stopwatch.StartNew();
	Vector3 a = Vector3.zero;
	Vector3 b = Vector3.one;

	for (int i = 0; i < 1000000; i++)
	{
		float x = a.x + b.x;
		float y = a.y + b.y;
		float z = a.z + b.z;
		Vector3 c = new Vector3 (x, y, z);
	}

	stopwatch.Stop();
	Debug.Log(stopwatch.ElapsedTicks);
}

It ends up twice as slow than just doing this:

public static void Vector3AdditionTest()
{
	var stopwatch = Stopwatch.StartNew();
	Vector3 a = Vector3.zero;
	Vector3 b = Vector3.one;

	for (int i = 0; i < 1000000; i++)
	{
		Vector3 c = a + b;
	}

	stopwatch.Stop();
	Debug.Log(stopwatch.ElapsedTicks);
}

Though such a large number of iterations sounds like a job for Burst and/or Jobs.

4 Likes

Deep profiler makes these head to head comparisons pretty useless.

It’s not a bad idea to try using the Performance Testing Package to experiment with this kind of thing.

using NUnit.Framework;
using System.Runtime.CompilerServices;
using Unity.PerformanceTesting;
using UnityEngine;

public class NewTestScript2
{
    const int n = 1024 * 1024;
    static Vector3 c { get; set; }

    [Test, Performance]
    public void Add_Vector3_Vector3_CustomOperator()
    {
        Measure.Method(static () =>
        {
            Vector3 a = new(1, 2, 3), b = new(4, 5, 6);
            for (int i = 0; i < n; i++)
            {
                c = Add(a, b);
            }
        })
            .WarmupCount(8)
            .DynamicMeasurementCount()
            .SampleGroup(new SampleGroup(nameof(Add_Vector3_Vector3_Operator), SampleUnit.Microsecond, false))
            .Run();
        static Vector3 Add(Vector3 a, Vector3 b)
        {
            return new Vector3(a.x + b.x, a.y + b.y, a.z + b.z);
        }
    }

    [Test, Performance]
    public void Add_Vector3_Vector3_CustomOperator_AggressiveInlining()
    {
        Measure.Method(static () =>
        {
            Vector3 a = new(1, 2, 3), b = new(4, 5, 6);
            for (int i = 0; i < n; i++)
            {
                c = Add(a, b);
            }
        })
            .WarmupCount(8)
            .DynamicMeasurementCount()
            .SampleGroup(new SampleGroup(nameof(Add_Vector3_Vector3_Operator), SampleUnit.Microsecond, false))
            .Run();
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static Vector3 Add(Vector3 a, Vector3 b)
        {
            return new Vector3(a.x + b.x, a.y + b.y, a.z + b.z);
        }
    }

    [Test, Performance]
    public void Add_Vector3_Vector3_Operator()
    {
        Measure.Method(static () =>
        {
            Vector3 a = new(1, 2, 3), b = new(4, 5, 6);
            for (int i = 0; i < n; i++)
            {
                c = a + b;
            }
        })
            .WarmupCount(8)
            .DynamicMeasurementCount()
            .SampleGroup(new SampleGroup(nameof(Add_Vector3_Vector3_Operator), SampleUnit.Microsecond, false))
            .Run();
    }

    [Test, Performance]
    public void Add_Vector3_Vector3_Componentwise()
    {
        Measure.Method(static () =>
        {
            Vector3 a = new(1, 2, 3), b = new(4, 5, 6);
            for (int i = 0; i < n; i++)
            {
                c = new Vector3(a.x + b.x, a.y + b.y, a.z + b.z);
            }
        })
            .WarmupCount(8)
            .DynamicMeasurementCount()
            .SampleGroup(new SampleGroup(nameof(Add_Vector3_Vector3_Componentwise), SampleUnit.Microsecond, false))
            .Run();
    }

    [Test, Performance]
    public void Add_Vector3_Components_Componentwise()
    {
        Measure.Method(() =>
        {
            Vector3 a = new(1, 2, 3);
            float b0 = 4, b1 = 5, b2 = 6;
            for (int i = 0; i < n; i++)
            {
                c = new Vector3(a.x + b0, a.y + b1, a.z + b2);
            }
        })
            .WarmupCount(8)
            .DynamicMeasurementCount()
            .SampleGroup(new SampleGroup(nameof(Add_Vector3_Components_Componentwise), SampleUnit.Microsecond, false))
            .Run();
    }
}

Add_Vector3_Components_Componentwise in Microseconds
Min:		6259.30 μs
Median:		6284.10 μs
Max:		6348.30 μs
Avg:		6288.89 μs
StdDev:		26.69 μs
SampleCount:	9
Sum:		56600.00 μs

Add_Vector3_Vector3_Componentwise in Microseconds
Min:		6266.30 μs
Median:		6282.20 μs
Max:		6300.80 μs
Avg:		6285.38 μs
StdDev:		12.41 μs
SampleCount:	9
Sum:		56568.40 μs

Add_Vector3_Vector3_CustomOperator in Microseconds
Min:		18952.40 μs
Median:		19220.10 μs
Max:		19467.30 μs
Avg:		19190.06 μs
StdDev:		147.02 μs
SampleCount:	9
Sum:		172710.50 μs

Add_Vector3_Vector3_CustomOperator_AggressiveInlining in Microseconds
Min:		9284.20 μs
Median:		9304.90 μs
Max:		9355.20 μs
Avg:		9310.80 μs
StdDev:		22.99 μs
SampleCount:	9
Sum:		83797.20 μs

Add_Vector3_Vector3_Operator in Microseconds
Min:		9303.00 μs
Median:		9335.90 μs
Max:		9376.10 μs
Avg:		9333.42 μs
StdDev:		25.71 μs
SampleCount:	9
Sum:		84000.80 μs
4 Likes

As others said, try with Burst, Mathematics, Jobs.

This is the most performant (classic) option in my tests that I haven’t seen mentioned above

var r = Vector3.zero;
for(int i = 0; i < C; i++) {
  r.x = a.x + b.x;
  r.y = a.y + b.y;
  r.z = a.z + b.z;
}

On my machine I get (for 100M iterations):
800 ms for var r = a + b (likely because the operator isn’t inlined)
550 ms for custom component-wise addition var r = new Vector3(a.x + b.x, ...)
280 ms for the example above

I think it’s because the same struct is repurposed so the stack size stays the same.
You’d still have to store this result to some data structure, and this I believe is the main bottleneck depending on how local your array (or whatever you’re using) is.

I.e. just doing this

Vector3 r = Vector3.zero;
Vector3[] x = new Vector3[C];
for(int i = 0; i < C; i++) {
  r.x = a.x + b.x;
  r.y = a.y + b.y;
  r.z = a.z + b.z;
  x[i] = r;
}

now costs 600 ms for 100M iterations (allocation is included, it’s 400 ms without it).
Yet this still means 6 nanoseconds per iteration, on my 10+ years old i5 processor.

In a game with 60 FPS you’ve got more than 16 million nanoseconds of time, so you should be able to comfortably do over 2M iterations in real time. If you need 100M than you’d have to distribute the load over multiple frames (in this case 50 frames would be enough, which would take just 0.833 seconds of real time).

Edit:
Now maybe this kind of test is too naive and triggers some unseen optimization, and a and b should really be randomized in their own respective arrays as well.

2 Likes

I understand that the deep profile may be misleading so I made a new benchmark, using the stopwatch.

The function is this, running 100 million times. x,y,z replaced by “i”.

        stopwatch.Start();

        for (int i=0; i<100000000; i++)
        {
            CalculateFaceVertex(ref FaceVertex, 0, i, i, i);
        }
        stopwatch.Stop();
       
        UnityEngine.Debug.Log(stopwatch.Elapsed.Milliseconds);

Here’s the first version of the function
Run time for 100 million iterations: around 900 ms

    public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
    {
        Vector3 Origin = new Vector3(x, y, z);
        for (int i = 0; i < 4; i++)
        {
            FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.

        }
    }

Here’s the second version of the function
Run time for 100 million iterations: around 900 ms

    public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
    {
        Vector3 Origin = new Vector3(x, y, z);
        for (int i = 0; i < 4; i++)
        {
            //FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.
            FaceVertex[i].x = FaceVertexHelp[ThisFaceId, i].x + Origin.x;
            FaceVertex[i].y = FaceVertexHelp[ThisFaceId, i].y + Origin.y;
            FaceVertex[i].z = FaceVertexHelp[ThisFaceId, i].z + Origin.z;
        }
    }

Here’s the third version of the function
Run time for 100 million iterations: around 700 ms

    public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
    {
       // Vector3 Origin = new Vector3(x, y, z);
        for (int i = 0; i < 4; i++)
        {
            //FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.
            FaceVertex[i].x = FaceVertexHelp[ThisFaceId, i].x + x;
            FaceVertex[i].y = FaceVertexHelp[ThisFaceId, i].y + y;
            FaceVertex[i].z = FaceVertexHelp[ThisFaceId, i].z + z;
        }
    }

Here’s the fourth version of the function
Run time for 100 million iterations: around 600 ms

    public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
    {
        Vector3 Origin = new Vector3(x, y, z);
        for (int i = 0; i < 4; i++)
        {
            //FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.
            //FaceVertex[i].x = FaceVertexHelp[ThisFaceId, i].x + x;
            //FaceVertex[i].y = FaceVertexHelp[ThisFaceId, i].y + y;
            //FaceVertex[i].z = FaceVertexHelp[ThisFaceId, i].z + z;
            Vector3Add(ref FaceVertex[i], FaceVertexHelp[ThisFaceId, i],Origin);
        }
    }

    public void Vector3Add(ref Vector3 res, Vector3 a, Vector3 b)
    {
        res = a + b;
    }

Here’s the fifth version of the function
Run time for 100 million iterations: around 400 ms

    public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
    {
        //Vector3 Origin = new Vector3(x, y, z);
        for (int i = 0; i < 4; i++)
        {
            //FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.
            //FaceVertex[i].x = FaceVertexHelp[ThisFaceId, i].x + x;
            //FaceVertex[i].y = FaceVertexHelp[ThisFaceId, i].y + y;
            //FaceVertex[i].z = FaceVertexHelp[ThisFaceId, i].z + z;
            Vector3Add(ref FaceVertex[i], FaceVertexHelp[ThisFaceId, i],x,y,z);
        }
    }

    public void Vector3Add(ref Vector3 res, Vector3 a, int x, int y, int z)
    {
        res.x = a.x + x;
        res.y = a.y + y;
        res.z = a.z + z;
    }

Here’s the sixth version of the function
Run time for 100 million iterations: around 400 ms

    public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
    {
        //Vector3 Origin = new Vector3(x, y, z);
        for (int i = 0; i < 4; i++)
        {
            //FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.
            //FaceVertex[i].x = FaceVertexHelp[ThisFaceId, i].x + x;
            //FaceVertex[i].y = FaceVertexHelp[ThisFaceId, i].y + y;
            //FaceVertex[i].z = FaceVertexHelp[ThisFaceId, i].z + z;
            Vector3Add(ref FaceVertex[i], FaceVertexHelp[ThisFaceId, i],ref x,ref y,ref z);
        }
    }

    public void Vector3Add(ref Vector3 res, Vector3 a, ref int x, ref int y, ref int z)
    {
        res.x = a.x + x;
        res.y = a.y + y;
        res.z = a.z + z;
    }

FYI, the missing code, but it won’t make a difference:

    private static readonly Vector3[,] FaceVertexHelp = new Vector3[,]
{
        {new Vector3(0, 1, 0), new Vector3(0, 1, 1), new Vector3(1, 1, 1), new Vector3(1, 1, 0)}, //top
        {new Vector3(0, 0, 0), new Vector3(0, 0, 1), new Vector3(1, 0, 1), new Vector3(1, 0, 0)}, //bottom
        {new Vector3(0, 0, 0), new Vector3(0, 1, 0), new Vector3(1, 1, 0), new Vector3(1, 0, 0)}, //front
        {new Vector3(0, 0, 1), new Vector3(0, 1, 1), new Vector3(1, 1, 1), new Vector3(1, 0, 1)}, //back
        {new Vector3(0, 0, 1), new Vector3(0, 1, 1), new Vector3(0, 1, 0), new Vector3(0, 0, 0)}, //left
        {new Vector3(1, 0, 1), new Vector3(1, 1, 1), new Vector3(1, 1, 0), new Vector3(1, 0, 0)}, //right
};

In the end, I still got a 2.25x improvement…

ps. 7th version, 100 million iterations in 330 ms, 2.72x improvement
pps. Adding aggressive inline to Vector3Add function leads to worse timings.

   [MethodImpl(MethodImplOptions.AggressiveInlining)]
   public void CalculateFaceVertex(ref Vector3[] FaceVertex, int ThisFaceId, int x, int y, int z)
   {
       //Vector3 Origin = new Vector3(x, y, z);
       for (int i = 0; i < 4; i++)
       {
           //FaceVertex[i] = FaceVertexHelp[ThisFaceId, i] + Origin; //Find out why this is inefficient. Wrote topic on Unity forums.
           //FaceVertex[i].x = FaceVertexHelp[ThisFaceId, i].x + x;
           //FaceVertex[i].y = FaceVertexHelp[ThisFaceId, i].y + y;
           //FaceVertex[i].z = FaceVertexHelp[ThisFaceId, i].z + z;

           Vector3Add(ref FaceVertex[i], FaceVertexHelp[ThisFaceId, i],ref x,ref y,ref z);
       }
   }