General performance/optimization tips for Unity

This reminded me of something super duper important that not many people know:
Whenever you call this.transform in a monobehaviour, it actually does a GetComponent (or an equivalent) of the transform component every time. So, you must always cache your transform component on Awake() or Start() in order to use it later on Update()

I can prove it by re-using my 10k monobehaviours test that each move their transform on Update(). If I don’t cache the transform, and instead use this.transform to access it every frame, the average ms per frame jumps from 8 to 11 ms

Same goes for Camera.main and all sorts of gameObject.something

EDIT: Okay, that’s pretty weird. What if I told you that this:

void Update()
    {
        gameObject.GetComponent<Transform>().position += Vector3.forward * Time.deltaTime;
    }

Performs slightly better than this:

void Update()
    {
        this.transform.position += Vector3.forward * Time.deltaTime;
    }

The gameObject.GetComponent() version is actually 1 ms faster…

I dunno, if it becomes super useful then it’s probably better to put it on the wiki or something, reason being is that the data can go out of date with unity versions so it should also explicitly state version. Plus it should be peer reviewed rather than taken as gospel. Following thread though, see what happens.

Then there’s the whole IL2CPP and ongoing optimisations…

Speaking of out of date optimisations:

What version are you using? From 5.0 the transforms are cached by the engine anyway (They were supposed to be, I seldom have much need for this sort of micro-optimisation). And none of the GameObject.somethingElse properties work.

I thought I heard that too, but it seems that for whatever reason, this.transform is heavier than GetComponent(), which is in turn heavier than a manually cached transform

The test I made was on 5.3.2f1

fda

Well of course a manually cached transform is going to be fastest. It’s already referenced locally in Mono, no overhead of a function call (this.transform is just a property accessor, a function call). As well as no overhead of communicating with the internal unity runtime.

Thing is I just tested, and ‘this.transform’ is faster than ‘this.GetComponent()’ on my system. This was with 5.3.1f1

Code displays averages

using UnityEngine;

public class ZTestScript02 : MonoBehaviour {

    public int Count = 500000;

    private Transform _t;
    double _directMS;
    double _propertyMS;
    double _genericMS;
    double _typeMS;

    void Awake()
    {
        _t = this.GetComponent<Transform>();
    }

    void Update()
    {

        var watch = new System.Diagnostics.Stopwatch();

        watch.Start();
        for (int i = 0; i < this.Count; i++)
        {
            var t = _t;
        }
        watch.Stop();
        _directMS += (watch.Elapsed.Milliseconds - _directMS) * 0.1d;
        watch.Reset();



        watch.Start();
        for(int i = 0; i < this.Count; i++)
        {
            var t = this.transform;
        }
        watch.Stop();
        _propertyMS += (watch.Elapsed.Milliseconds - _propertyMS) * 0.1d;
        watch.Reset();



        watch.Start();
        for (int i = 0; i < this.Count; i++)
        {
            var t = this.GetComponent<Transform>();
        }
        watch.Stop();
        _genericMS += (watch.Elapsed.Milliseconds - _genericMS) * 0.1d;
        watch.Reset();



        watch.Start();
        for (int i = 0; i < this.Count; i++)
        {
            var t = this.GetComponent(typeof(Transform)) as Transform;
        }
        watch.Stop();
        _typeMS += (watch.Elapsed.Milliseconds - _typeMS) * 0.1d;
        watch.Reset();
    }
  
    void OnGUI()
    {
        //display average access time
        GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _directMS)));
        GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _propertyMS)));
        GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _genericMS)));
        GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _typeMS)));
    }

}

speeds on my i7-2600K at 3.4ghz with 32GB RAM and Geforce GTX960 running Windows 10, Unity ver 5.3.1f1
times are for copying references 500,000 times each.
direct: ~2ms
property: ~14ms
GetComponent: ~30ms
GetComponent(typeof): ~51ms

I’m testing merely just copying a reference to the transform, as this is the rawest data I can think of. No modifying or anything, as that would just dilute the results.

the transform property appears to be slightly over twice as fast as GetComponent.

Note, GetComponent is faster than GetComponent(typeof), and this is because unity has changed the way they get components with GetComponent in a rather ingenious way if might say so. It used to be just a forwarding method to the typeof method, but now it does this:

from decompiled UnityEngine.dll:

    [WrapperlessIcall]
    [MethodImpl(MethodImplOptions.InternalCall)]
    internal void GetComponentFastPath(System.Type type, IntPtr oneFurtherThanResultValue);

    [SecuritySafeCritical]
    public unsafe T GetComponent<T>()
    {
      CastHelper<T> castHelper = new CastHelper<T>();
      this.GetComponentFastPath(typeof (T), new IntPtr((void*) &castHelper.onePointerFurtherThanT));
      return castHelper.t;
    }

Note the ‘CastHelper’ is a struct like this:

using System;

namespace UnityEngine
{
  internal struct CastHelper<T>
  {
    public T t;
    public IntPtr onePointerFurtherThanT;
  }
}

So basically they create a struct on the stack in the method ‘GetComponent’, they then pass in the memory address of the field that just follows a field typed to what we want the result to be. Since we know how the struct packs, the internal code knows the reference should be placed 4/8 bytes (32/64bit) before the address passed in.

This is nice since really on the internal side, they don’t have to deal with preparing the reference for return. Instead all they do is on the internal side is set the pointer at that address to the address of the component, which it has stored internally anyways, and you don’t have a cast on the mono side which you would if you called ‘GetComponent(typeof)’ since that returns it typed as Component.

It’s literally just an int copy.

With that knowledge, this COULD result in higher efficiency on some machines vs the way the transform property works… maybe? Are you using OSX per chance?

4 Likes

I actually just redid the test at home on a different machine with the exact same code I had earlier and my results are now similar to yours. Dunno what happened. I remember double-checking that I didn’t mix things up because it really surprised me, so I don’t think it was just a mistake.

I was on windows x64 in both cases.

But wouldn’t it be better to test a more common situation? Here you are retrieving the same thing over and over which seems pointless to me. The optimization for that would be just stopping to do that because it makes no sense, right? But what if you have half a million objects from which you want to retrieve their individual tranforms. That sounds a lot closer to a realistic usecase to me. I just had a gut feeling that your attempt not to dilute the results diluted them even further from being applicable to realworld scenarios. It’s just a gut feeling that I wanted to test and I might have done something wrong because I’m still a noob with these things. So please have a look at my alternate test proposal and tell me what you think.

I’ve copied your original code first, let it run a little and the times are:
1.615 ms
11.986 ms
27.365 ms
44.459 ms
On an i7 with win7 64bit.

And now my twist on your test:

using UnityEngine;

public class ZTestScript06 : MonoBehaviour {

   private int Count = 500000;

   private Transform _t;
   double _directMS;
   double _propertyMS;
   double _genericMS;
   double _typeMS;
   GameObject[] array;

   void Awake()
   {
     _t = this.GetComponent<Transform>();
     array = new GameObject[Count];

     for (int i = 0; i < this.Count; i++)
     {
       array[i] = new GameObject();
     }
   }

   void Update()
   {

     var watch = new System.Diagnostics.Stopwatch();

     watch.Start();
     for (int i = 0; i < this.Count; i++)
     {
       //var t = _t;
       var t = array[i];
     }
     watch.Stop();
     _directMS += (watch.Elapsed.Milliseconds - _directMS) * 0.1d;
     watch.Reset();



     watch.Start();
     for(int i = 0; i < this.Count; i++)
     {
       //var t = this.transform;
       var t = array[i].transform;
     }
     watch.Stop();
     _propertyMS += (watch.Elapsed.Milliseconds - _propertyMS) * 0.1d;
     watch.Reset();



     watch.Start();
     for (int i = 0; i < this.Count; i++)
     {
       //var t = this.GetComponent<Transform>();
       var t = array[i].GetComponent<Transform>();
     }
     watch.Stop();
     _genericMS += (watch.Elapsed.Milliseconds - _genericMS) * 0.1d;
     watch.Reset();



     watch.Start();
     for (int i = 0; i < this.Count; i++)
     {
       //var t = this.GetComponent(typeof(Transform)) as Transform;
       var t = array[i].GetComponent(typeof(Transform)) as Transform;
     }
     watch.Stop();
     _typeMS += (watch.Elapsed.Milliseconds - _typeMS) * 0.1d;
     watch.Reset();
   }

   void OnGUI()
   {
     //display average access time
     GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _directMS)));
     GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _propertyMS)));
     GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _genericMS)));
     GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _typeMS)));
   }

}

1.996 ms //doesn’t retrieve the transform actually and just gets the object
62.069 ms
75.015 ms
93.616 ms

The differences seem a lot less pronounced now. Would you agree that this is a more useful test of real-world performance than the one originally proposed?

Well not really.

The point of the test wasn’t to demonstrate a real world situation. It was an attempt to measure the speed of each relative to one another, so to know which is actually the fastest, and not necessarily by how much.

Note, how in your results, the actual differences between them are:
accessing this
generic is ~15.4ms slower than property
typeof is ~17ms slower than generic

accessing array
generic is ~13ms slower than property
typeof is ~18ms slower than generic

The differences in speed is actually roughly the same. They’ve all scaled up in cost by about 50ms, every one of them, similarly. With one exception, the direct access… with good reason.

UnityEngine.Objects (GameObject, Transform, scripts, etc) have 2 parts to them. The mono/.net object, and the C++ object. They sit in two completely different parts of memory. When you call through to the internal unity stuff, it needs to lookup the related internal object to the mono/.net object.

Now… I don’t know how this is done personally. Maybe the instanceID is a hash on some hashtable, maybe they cache the pointer address somewhere, I just don’t know. What I do know though is that there is SOME cost to it, and it’s reasonable to assume that as it grows in size, it grows in cost.

And your example demonstrates that growth in cost. All methods had a similar increase in cost of calling it, 50ms. It’s reasonable to blame this cost on that.

And the fact that the direct access doesn’t come with that cost is easily explained as well. There’s no communication between mono and the unity internal code. All it is, is a direct memory access of the mono memory heap. Accessing an array by the [index] accessor, versus accessing a variable, is pretty much identical in speed.

You can run a test to prove it.

Here is a simple Console application that demonstrates it:

using System;

namespace Console01
{
    internal class Program
    {
        public static void Main()
        {

            const int COUNT = 500000;
            const int LOOP = 5000000;
            int[] arr = new int[COUNT];
            int no = 0;

            double noMS = 0d;
            double lowMS = 0d;
            double highMS = 0d;
            var watch = new System.Diagnostics.Stopwatch();
            var rand = new Random();

            while(true)
            {
                watch.Start();
                for (int i = 0; i < LOOP; i++)
                {
                    var j = no.ToString();
                }
                watch.Stop();
                noMS += (watch.Elapsed.Milliseconds - noMS) * 0.1d;
                watch.Reset();

                int low = rand.Next(10);
                watch.Start();
                for (int i = 0; i < LOOP; i++)
                {
                    var j = arr[low].ToString();
                }
                watch.Stop();
                lowMS += (watch.Elapsed.Milliseconds - lowMS) * 0.1d;
                watch.Reset();


                int high = rand.Next(COUNT - 11, COUNT - 1);
                watch.Start();
                for (int i = 0; i < LOOP; i++)
                {
                    var j = arr[high].ToString();
                }
                watch.Stop();
                highMS += (watch.Elapsed.Milliseconds - highMS) * 0.1d;
                watch.Reset();


                Console.WriteLine("{0:0.000}ms : {1:0.000}ms : {2:0.000}ms", noMS, lowMS, highMS);
                System.GC.Collect(); //force collect all those strings, otherwise GC may throw off the StopWatch
            }
        }
   
    }

}

So yeah, I wasn’t going for a real world scenario (none of these examples are real world). I was just trying to compare the raw difference in property, generic GetComponent, and typeof GetComponent.

What your results do help to show others though is that there is an increased cost to accessing these methods when you have more objects in the scene.

But the relative difference between these methods remains the same.

NOW… and @PhilSA may find this interesting, there was a really WEIRD result I found when running your code though.

For smaller values of ‘Count’, downward of 10,000… the property accessor of Transform ended up being slower than the generic GetComponent call. Results similar to what PhilSA was claiming earlier.

At 10,000 items, with the array, I was getting:

property: ~1ms
generic: ~0.1ms
typeof: 0.2ms to 1.1ms (this one was weird, it jumped all over the place, and changed every time I played the benchmark)

Even though the typeof access jumped all over. The property and generic access was rock solid.

And this throws a whole wrench into the benchmark in general.

I have some theories as to why it might be happening… but I’m not exactly sure, so I can’t really say. But maybe the transform property algorithm has a cost with a near linear growth curve, where as GetComponent has a cost with a polynomial growth curve.

So, for object counts lower than N, GetComponent ends up being faster, but end up slower as N increases.

Think of how if you plot a line, and a parabola, with vertex at (0,0). Values for x near 0, the line will be above the parabola, but for values of x as it approaches infinity, the parabola is much higher than the line.

2514448--174132--parabola_vs_line.png

3 Likes

Very interesting, thanks a lot for the explanation!

I just made a test for that thing I put in my quick tips section:

And I can now confirm that the performance difference of virtual functions or overrides is completely meaningless. The price of 100000 overridden function calls was about 0.5ms more than the non-virtual function. So yeah… not worth it

I’m removing that item from the list

EDIT:
Hold on, maybe it does matter after all.

The inheritance depth of a class does potentially have a noticeable effect on performance. Here’s my test output:

The cost increases with each inheritance level, so maybe don’t go too crazy with huge inheritance chains in which stuff happens at every frame

1 Like

yeah, because basically as you add classes to the inheritance chain, and override, you have a function call for each level. They’re effectively independent functions from one another getting called up the chain.

There is an extended cost the first time it’s called, and the JIT compiler has to build the function chain. But that’s a one time cost.

But the over all overhead is mostly negligible. It’s really only noticeable for small functions, where in the function does so little that it’s comparatively equal to the cost of a function call.

If you had to do a 5 minute job, and it took you 3 minutes to prepare for it, that 3 minute preparation time is HUGE. If you added 3 more 5 minute jobs all taking 3 minutes to prep for, the prep time would annoy you, as you just spend 32 minutes doing 20 minutes of work. You might want to reorganize that job so that you only prep once, and do 3 minutes prep, 20 minutes work, and 23 minutes overall.

Where as if you had an hour long job, the 3 minutes isn’t even noticeable.

Virtual methods, and overrides in general, are usually for integral tasks, or heavy tasks that need possible modification. Seldom are they for small minor tasks… a property getter is seldom overridable.

Where as Constructor, it’s mandatory that it be overridable, you don’t have to even flag it as such, it is implicitly virtual. Since it’s integral.

In unity virtual methods would be integral for things like unity messages. Start, Awake, Update, OnTriggerEnter, etc. They should always be marked virtual, unless you don’t plan to allow that class to ever be inherited from. In which case you usually mark it sealed, so that it can’t be inherited from. Of course on small or 1 man teams you may over look this, but on large teams or distributed libraries/apis, it’s super important so that users understand the intent of the class.

A big reason why sealed is important is that technically if I inherit from your class, I can declare a method with the same name as your method and just call it ‘new’. And if I’m stuck in a corner (and don’t understand the implications) I might feel forced to use new since I can’t actually override your method.

public class FooA : MonoBehaviour
{

    public void OnTriggerEnter()
    {
        Debug.Log("FooA");
    }

}

public class FooB : FooA
{

    public new void OnTriggerEnter()
    {
        Debug.Log("FooB");
    }

}

In this case, unity will call FooB’s OnTriggerEnter, since it’s the first method with that name found when reflecting it out.

This also goes for if FooA marked it as private, and I just defined FooB.OnTriggerEnter with out new (since it’s private, don’t need the new, private members scope only to the class they’re declared in). Which is a good reason to always have your message handles protected or public, unless again marked as sealed, so that way you force warning to the user that if they plan to implement that message they need to be aware of the class hierarchies use of it already.

1 Like

I can’t quite figure out what this is supposed to mean. Don’t make last-minute decisions, but also don’t calculate something and discard it… If you don’t test whether you need the calc, how do you avoid the unnecessary calc?

On the plus side, skimming this thread convinced me to finally spend 30 or 40 minutes adding multicast delegation for a single manager-style Update instead of MonoBehaviour Update calls everywhere. The improvement was fairly significant. Good bang for the buck, in my case, at least at very high res on sub-optimal hardware (e.g. where it counts). Slowly but surely we’re shedding the baggage learned from all the tutorials…

Perhaps in one sense, delegating update calls could be a special-case example of what I think you’re trying to say in that quoted point above: “turn off” the call on a cached but currently unused instance, for example?

It might be better worded as: Structure your methods to exit as early as possible.

A contrived example:

public void OperateOnSomeGameObject(GameObject obj)
{
     float possibleUselessCalculation = obj.transform.position.magnitude;

     if (obj.activeSelf == true)
     {
          // do something with possibleUselessCalculation...
     }
}

If you know ahead of time that the calculation isn’t needed, then don’t perform it. In the above code, it’d be better to put the .magnitude calculation inside of the if block, which is where its used. These can sometimes be tricky to track down, especially in monster methods with excessive branching. That’s why its a good idea to keep methods small and easy to parse.

1 Like

@MV10 :
@kru nailed it. Here’s a more realistic example, using a player that cannot be controlled while in midair:

private void Update() {
  // Read input
  Vector2 mouseInputX = Input.GetAxis("Mouse X");
  Vector2 mouseInputY = Input.GetAxis("Mouse Y");
  Vector2 moveHorizontal = Input.GetAxis("Horizontal");
  Vector2 moveVertical = Input.GetAxis("Vertical");

  // Calculate the base values for the movement
  Vector3 desiredMovement = MovementFromInput(moveHorizontal, moveVertical);
  Quaternion newPlayerRotation = PlayerRotationFromInput(mouseInputX, mouseInputY);

  // Apply environment based modifiers
  desiredMove = MovementSlopePenalty(desiredMove);
  desiredMove = FloorMaterialPenalty(desiredMove, GetCurrentFloorMaterial());

  // More logic
  ...

  // Apply when grounded
  if(IsGrounded()) {
    rigidbody.AddForce(desiredMove, ForceMode.Impulse);
    transform.rotation = newPlayerRotation;
  }
}

It makes a lot more sense to structure the code like this:

private void Update() {
  if(IsGrounded()) {
    // Read input
    Vector2 mouseInputX = Input.GetAxis("Mouse X");
    Vector2 mouseInputY = Input.GetAxis("Mouse Y");
    Vector2 moveHorizontal = Input.GetAxis("Horizontal");
    Vector2 moveVertical = Input.GetAxis("Vertical");

    // Calculate the base values for the movement
    Vector3 desiredMovement = MovementFromInput(moveHorizontal, moveVertical);
    Quaternion newPlayerRotation = PlayerRotationFromInput(mouseInputX, mouseInputY);

    // Apply environment based modifiers
    desiredMove = MovementSlopePenalty(desiredMove);
    desiredMove = FloorMaterialPenalty(desiredMove, GetCurrentFloorMaterial());

    // More logic
    ...

    // Apply
    rigidbody.AddForce(desiredMove, ForceMode.Impulse);
    transform.rotation = newPlayerRotation;
  }
}

As you see, you need to perform the grounded-check anyway, so it’s much better, if you do it first, which allows you to skip all the expensive player logic, if he’s in midair.

Think of it this way:
Only do calculations, if you know (for sure) that you will (for sure) need them. If you don’t know whether this is the case, find out first.
Now, I don’t want to rule out the possibility of exceptions, but in general, this is how you should handle it.

I’d call that common sense :slight_smile: … The original wording threw me, but I appreciate the clarification nonetheless.

1 Like

@MV10 :
You say that, but this is a really, really simple example. As your code gets more and more complex, those problems get more and more difficult to spot.
The above example was a single method inside a single component. Once your processes start having dependencies across methods, classes, even components, this can tricky. Especially if your workflow is iterative. These are the kinds of issues you usually find while refactoring.

Common sense? Is that in the Asset Store? :stuck_out_tongue:

Wow! The tip about MonoBehaviour is awesome! Just checked it myself. But man, managing objects in scene without having access to inspector is soooooo inconvenient.(((

As they say: Common sense 'aint so common.

2 Likes

3 Likes