Instance of Vector3.zero faster than Vector3.zero?

I’m currently running an optimization audit on my codebase for my game and noticed Vector3.zero coming up in the profiler. It never really dawned on me that call to that value would have overhead in terms of a constructor and essentially what looks to be a getter in the Deep Profile information.

Here’s my test. I wanted to share it with you guys and get your feedback. It’s not much at first glance, but on my computer it was about 1 second faster to access an instance variable in the class for Vector3.zeo (vZero in the code below) as opposed to Vector3.zero when testing over 5000 iterations of each test against one another, profiler off. It was actually 5 seconds faster with the profiler on, but that’s just miscellaneous information. Even at 1000 iterations, the result was half a second better for the instance.

Just imagine how many places you might end up substituting instances for these convenient, but seemingly not-as-performant vector “getters” out in your code!

using UnityEngine;
using System.Collections;

public class VectorBenchmarks : MonoBehaviour {
    Vector3 vZero = Vector3.zero;
    Vector3 targetVector;
    public int testCount = 5000;
    public int currentTest = 0;
    // Use this for initialization
    void Start () {

        StartCoroutine("VectorTest");
    }

    IEnumerator VectorTest()
    {
        Debug.Log("Beginning test one, Vector3.zero");
        float startTime;
        float endTime;
        startTime = Time.time;
        while(currentTest < testCount ) {
            targetVector = Vector3.zero;
            ++currentTest;
            yield return null;
        }
        endTime = Time.time;

        Debug.Log("End Test One: startTime:" + startTime + "endTime:" + endTime + "duration of test:" + (endTime-startTime));

        currentTest = 0;

        Debug.Log("Beginning test two, instance variable definition of Vector3.zero");
        startTime = Time.time;
        while(currentTest < testCount ) {
            targetVector = vZero;
            ++currentTest;
            yield return null;
        }
        endTime = Time.time;
        Debug.Log("End Test Two: startTime:" + startTime + "endTime:" + endTime + "duration of test:" + (endTime-startTime));
    }
}

Here is my attempt at this:

using UnityEngine;
using System.Collections;

public class VectorProfile : MonoBehaviour {
	float t1, t2;
	// Use this for initialization
	void Start () {
		Vector3 v;
		Vector3 vz = Vector3.zero;
		t1 = Time.realtimeSinceStartup;
		for (int i = 0; i < 100000000; i++) {
			v = Vector3.zero;
		}
		Debug.Log("T1: " + (Time.realtimeSinceStartup - t1).ToString());
		t2 = Time.realtimeSinceStartup;
		for (int i = 0; i < 100000000; i++) {
			v = vz;
		}
		Debug.Log("T2: " + (Time.realtimeSinceStartup - t2).ToString());
	
	}
	
	// Update is called once per frame
	void Update () {
	
	}
}

Here are my results of doing 100,000,000 iterations of each.

T1: 2.046641 UnityEngine.Debug:Log(Object) VectorProfile:Start() (at Assets/Core Assets/Scripts/VectorProfile.cs:14) 
T2: 0.5151472 UnityEngine.Debug:Log(Object) VectorProfile:Start() (at Assets/Core Assets/Scripts/VectorProfile.cs:19)

T1, the first test, uses Vector3.zero. T2 uses an instance of Vector3.zero called vz.

Your performance test is seriously flawed. Here some couple of things that spring up my mind:

  • Use the right tool for the job. Time.time is not meant for precision timing. Use System.Diagnostics.Stopwatch.
  • Always measure in deployed programs, not in the editor or in development builds.
  • Try to minimize the “code under measurement”. Especially stay away from heavy noise like rendering frames or starting coroutines!
  • If possible, test the “empty case” as a control group. This measures your test case overhead. If you do the previous point right, this will spill out “0”. Also, you will be suprised how often you get an “WTF??” - moment, when the empty case turns out to be slower than the normal case. :smiley: (If that happens, read up on “JIT compiler in C#”)

Sure, usually you skip some or most of these points. If you are sure what you are doing and the performance differences are rather drastical, there is no need for going high-precision.

But in your case, you want to measure 1 second in 5000 iterations, which is 0.2 micro seconds. That is way too short to use a crude Axe-technique.

In the end, Vector3.zero may still be 100 times slower than accessing a non-volatile local member copy (my money is on either “cache faults” or “bad mono 2.6 JIT compiler”), but I really doubt it takes your measured 200µs for every access.

Here is what I did:

Vector3 vZero = Vector3.zero;
void Start() {
	var watch = new System.Diagnostics.Stopwatch();
	Vector3 v;
	
	watch.Reset(); watch.Start();
	for (int i = 0; i < 10000000; ++i)
	{ v = Vector3.zero; }
	watch.Stop();
	result += "property: " + watch.ElapsedMilliseconds + " ms

";

	watch.Reset(); watch.Start();
	for (int i = 0; i < 10000000; ++i)
	{ v = vZero; }
	watch.Stop();
	result += "local: " + watch.ElapsedMilliseconds + " ms

";

	watch.Reset(); watch.Start();
	for (int i = 0; i < 10000000; ++i)
	{ }
	watch.Stop();
	result += "empty: " + watch.ElapsedMilliseconds + " ms

";
}

string result = "";
void OnGUI()
{
	GUI.Label(new Rect(20, 20, 500, 100), result);
}

For my machine, it spits out

property: 120 ms
local: 21 ms
empty: 10 ms

So for my measurement, the difference is roughly 100ms for 10 million iterations, or in other words: 1 nano second per access.