Vector3 Operations performance

I did some testing on Vector3 operations (Subtractions Additions) and found something interesting that I don’t understand. (btw: I have a bad toothache and didn’t feel like doing any serious work so I was just poking around)

IEnumerator VectorOperations()
    {
        Vector3 position = new Vector3(1, 1, 1);
        Vector3 origin = new Vector3(0, 0, 0);

        while (true)
        {
            for (int i = 0; i < 25000; i++)
            {
                Vector3 addition = origin + position;
                Vector3 subtraction = origin - position;

                Vector3 addition2 = new Vector3(origin.x + position.x, origin.y + position.y, origin.z + position.z);
                Vector3 subtraction2 = new Vector3(origin.x - position.x, origin.y - position.y, origin.z - position.z);
            }

            yield return null;
        }
    }

The addition subtraction operations each took about 8.5ms to complete in the loop while the (longer form) version only took about 1.85ms each. Why are simple vector operations almost 5 times slower in the first form?

I realize that in order to see these numbers, I am looping 25000 times which is unrealistic however, it does show a huge performance difference between these two ways of doing simple vector operations.

@ 25k iterations there was no measurable difference, but at 250k there is a measurable difference.

I’m thinking that there are checks for invalid operations? Or maybe it’s something to do with the way structs work?

On my PC, at 25K iterations, I am looking at 4.5 times slower. For subtractions it is 8.5MS vs. 1.85ms … that is pretty big.

Q6600 running at 2.8 GHZ

mobile needs to consider these numbers in larger collections too.

Unity 3.4, Intel Core i5 Mobile (slower than Q6600):

The script below attached to empty scene main camera yields 120 ms for the first two additions and 60 ms for the second two additions.
That makes 12-24 microseconds for adding two vectors… (quite a bit slow)
I am not sure what the reason is frankly, also I don’t have time to investigate this. But Usually this should be up to 3 orders of magnitudes faster.

using UnityEngine;
using System.Collections;
using System.Diagnostics;

public class NewBehaviourScript : MonoBehaviour {

// Use this for initialization
void Start () {
Stopwatch watch = new Stopwatch();
watch.Start();
Vector3 position = new Vector3(1, 1, 1);
Vector3 origin = new Vector3(0, 0, 0);

for (int i = 0; i < 2500000; i++)
{
//Vector3 addition = origin + position;
//Vector3 subtraction = origin - position;

Vector3 addition2 = new Vector3(origin.x + position.x, origin.y + position.y, origin.z + position.z);
Vector3 subtraction2 = new Vector3(origin.x - position.x, origin.y - position.y, origin.z - position.z);
}

watch.Stop();
UnityEngine.Debug.Log(watch.Elapsed.TotalMilliseconds);

}

// Update is called once per frame
void Update () {

}
}

I just tested this with doing a manual Dot Product vs Vector3.Dot in the same Coroutine and got 0.18ms vs. 5.0ms …

dotProduct = (forward.x * direction.x) + (forward.y * direction.y) + (forward.z * direction.z);         
//dotProduct = Vector3.Dot(forward, direction);

btw: All my numbers come from Deep Profiling which obviously makes all these numbers look worst overall. However, their relative differences should remain constant (correct?)

Use the code I wrote ^^. It will measure with around nanosecond precision (at least on a recent computer and hopefully on MONO too)…

And never use a profiler for such measurements… You won’t get any useful information. A profiler is to detect hotspots and has preparations to neutralize its impact on performance measurements. If you do your own measurements in profiling sessions they will most likely be useless… And for this I suspect that the profiler “hooks” into the struct constructor for memory allocation and this is why it is taking much longer for you as for me the way through constructors is twice as fast as the way without them…

I am using that now :slight_smile:

I now get 52.5094ms using (long form) addition subtraction vs. 114.5767 using the (short form) so the gap is not as bad as it was using deep profiling but it is still half the speed. Granted, that’s a lot of iterations to get that which puts things back in perspective.

I knew the numbers from Deep Profilling would be worst but I never expected the relationship between those to be different. I always thought if the profiler showed a function being half the speed of another, that this would be consistant.

Profilers are usually bad at measuring speed for fast functions, this is why the more advanced ones will sieve them out at runtime and suggest you to add them on an ignore list, since profiling fast function might, as you have discovered now, not only lead to wrong results but it will heavily reduce performance during profiling, as you can notice in Unity, and also potentially invalidate the results of slower function using these profiled fast functions…

So the bottom line is that profiling is intended for larger functions where the actual profiling overhead is far outweighted by execution time of the function itself.

Q6600 @ 2.4 Ghz and both ran at ~1ms.

Using Mr.Burns Stopwatch method, I still see a big difference although not as significant…

For instance:

 for (int i = 0; i < 2500000; i++)
        {
            // These two get done in 116.8739ms
            //Vector3 addition = origin + position;
            //Vector3 subtraction = origin - position;     

            
Vector3 addition2 = new Vector3(origin.x + position.x, origin.y + position.y, origin.z + position.z);
            Vector3 subtraction2 = new Vector3(origin.x - position.x, origin.y - position.y, origin.z - position.z);
        }

Using Mr.Burns Stopwatch method, I still see a big difference although not as significant…

For instance:

 for (int i = 0; i < 2500000; i++)
        {
            // These two get done in 116.8739ms
            //Vector3 addition = origin + position;
            //Vector3 subtraction = origin - position;     

            // These two get done in 53.2564ms
            Vector3 addition2 = new Vector3(origin.x + position.x, origin.y + position.y, origin.z + position.z);
            Vector3 subtraction2 = new Vector3(origin.x - position.x, origin.y - position.y, origin.z - position.z);
        }

That is still twice as fast for manually adding or subtracting vectors. This is true for dot products or any other vector operations I have tested.

1 Like

Calling a function or using overloaded operators is going to be slower than using inline code.

This is a managed environment ^^. Its a very vague assumption without knowing how mono does it. It is possible to optimize this, that one great thing about managed environments, called “Runtime Optimization”, but I am not quite up to date how much of them they already use.

Read my early post. I also noted a differance - and FYI my code was better than Mr. Burns :stuck_out_tongue:

My apologies but I only referenced Mr.Burns because I am employed at his nuclear plant in Springfield an didn’t want to suffer his wrath and yes, you did noted those differences as well.

Really? I’ve heard a lot about your safety inspector.

@NPSF3000: Not that it bothers me much but which improvements do you have for the stopwatch ;)?

In this thread you can find an actual optimization for Vector3 and other structs

9 years old necro? How did you end up here? :smile: And Stephan is already working for Unity in the mean time.