Why is calling a function slower than just doing the operation?

I have noticed that, in some cases, I get worse performance when performing an operation using a function call than when doing the operation without any function calls. Here is the code used to demonstrate this:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Test : MonoBehaviour {

    // Use this for initialization
    void Start () {

        Debug.Log ("Testing math function call 10 times");
        for (int i = 0; i < 10; i++) {
            Test_FunctionCall ();
        }

        Debug.Log ("Testing NO function call 10 times");
        for (int i = 0; i < 10; i++) {
            Test_NoFuncCall ();
        }
    }

    void Test_FunctionCall () {

        float then = Time.realtimeSinceStartup;
       
        for (int x = 0; x < 100; x++) {
            for (int y = 0; y < 100; y++) {
                for (int z = 0; z < 100; z++) {

                    int result = DoMath (x, y, z);
                }
            }
        }

        float now = Time.realtimeSinceStartup;
        Debug.Log ("Did math 1000000 times in only: " + (now - then) * 1000 + "ms");
    }

    void Test_NoFuncCall () {

        float then = Time.realtimeSinceStartup;

        for (int x = 0; x < 100; x++) {
            for (int y = 0; y < 100; y++) {
                for (int z = 0; z < 100; z++) {
                   
                    int result = x * y + (z << 2);
                }
            }
        }

        float now = Time.realtimeSinceStartup;
        Debug.Log ("Did math 1000000 times in only: " + (now - then) * 1000 + "ms");
    }

    int DoMath (int x, int y, int z) {
        return x * y + (z << 2);
    }
}

On average, calling the DoMath function 1000000 times takes roughly 55ms to complete. Meanwhile, not calling the calling the function takes roughly 23ms to complete, saving about 30ms.

What’s weird about this is that the compiler usually performs trivial optimizations (such as changing x * 2 to a bitwise operation), but even though this feels like a trivial optimization, the compiler isn’t optimizing it. I know I shouldn’t make a big deal about microscopic optimizations, but a difference of a few microseconds becomes significant on an operation that is being done a million times.

So I have two question: why is the first test slower than the second and is there any way to tell the compiler to take the “x * y + (z << 2)” operation outside of the DoMath function to get the same performance that I got in the second test?

Well, without talking about optimization but about the code directly… It needs to call a function, pass the arguments (3 in total in your specific case), execute the operations and return a value. All of that takes a little bit of time… doing it in a tight loop (e.g for a performance test ) can thus reveal a significant difference.

First of all, you should try this using release configuration, as many optimization do only apply to release builds but not to debug modes/builds. Never try to do serious and final performance and reliability comparions using debug builds. Using a release build, the results could be much closer to each other or even be kinda the same).

Secondly, yes the compiler might inline short functions, recent .NET version even allow to mark methods using ‘agressive inlining’, but that’s probably not available (or at least not on stable 5.x version, maybe in 2017 version of Unity).
So if the JIT compiler happens to decide to inline that method, you’d be lucky and might get way with better results in regards to performance.

Inlining is not generally considered a trivial optimization and there is a whole science about it. In fact there are many patents covering it which means nobody else can use it unless they pay a royalty.

Also, until far later versions of C# than what Unity uses there was no “inline” keyword, and even in recent C# it is only a suggestion.

Further, if you are using any target that needs IL2CPP, any intermediate IL code is being transformed into C++ code before it is being compiled, so it’s gonna be even harder to speculate what can and cannot be inlined.

2 Likes