Generics + il2cpp performance analysis

Hey,

I’ve been recently playing around with Unity’s performance analysis tools and decided to check out how well il2cpp handles generic structs. Therefore I made this small test:

    public class CodeInjectionViaGenerics
    {
        private const int NUM_ITERATIONS = 500_000;

        public interface ICompute
        {
            float Compute(ref S s);
        }

        public struct Computer : ICompute
        {
            public float Compute(ref S s) => 2*s.x;
        }

        public static class StaticComputer
        {
            public static float Compute(ref S s) => 2*s.x;
        }

        public struct S { public float x; }

        public class A
        {
            public S s;
            public float Compute<T>() where T : struct, ICompute =>
                       default(T).Compute(ref s);
        }

        public class B
        {
            public S s;
            public float Compute() => StaticComputer.Compute(ref s);
        }
      
        [Test, Performance]
        public void ProfileA()
        {
            InitState(102745);
            var data = new A[NUM_ITERATIONS];
            for (int i = 0; i < NUM_ITERATIONS; i++)
            {
                data[i] = new A { s = new S { x = Range(0, 1) } };
            }
          
            Measure.Method(() =>
            {
                double sum = 0;
                for (int i = 0; i < NUM_ITERATIONS; i++)
                {
                    sum += data[i].Compute<Computer>();
                }
            })
            .MeasurementCount(100)
            .IterationsPerMeasurement(100)
            .WarmupCount(2000)
            .Run();
        }      
      
        [Test, Performance]
        public void ProfileB()
        {
            InitState(102745);
            var data = new B[NUM_ITERATIONS];
            for (int i = 0; i < NUM_ITERATIONS; i++)
            {
                data[i] = new B { s = new S { x = Range(0, 1) } };
            }
          
            Measure.Method(() =>
            {
                double sum = 0;
                for (int i = 0; i < NUM_ITERATIONS; i++)
                {
                    sum += data[i].Compute();
                }
            })
            .MeasurementCount(100)
            .IterationsPerMeasurement(100)
            .WarmupCount(2000)
            .Run();
        }
    }

The main idea is to check how viable it is to inject functionality in a class using generic arguments. This is a well known trick in C#. The results I got where quite surprising. I profiled my code in Unity 2020.20f1, but these results should be the same in 2020.1 and 2019 LTS.

So basically here they are (I report only the median, but if somebody is interested I can also report the other values, but there is nothing interesting going on there):
In the editor - Release Mode: ProfileA → 2.47ms, ProfileB → 2.52ms
Windows Standalone Il2cpp Release: ProfileA → 1.94ms , ProfileB → 0.30ms
Windows Standalone Il2cpp Master: ProfileA → 0.26ms , ProfileB → 0.28ms

This was definitely not what I was expecting - there is something really funky happening with Il2cpp Release. Therefore I looked around, read the generated Il2cpp code and realized that if I add [MethodImpl(MethodImplOptions.AggressiveInlining)] to A::Compute, then everything will work as expected and both implementations will be as fast, well approximately as fast.

Windows Standalone Il2cpp Release with AggressiveInlining: ProfileA → 0.28ms , ProfileB → 0.26ms

I definitely know the Unity devs are aware of this issue as I got the AggressiveInlining idea from looking at their generic code and I kind of understand how this situation came to be. Also, I do understand that this is a very simplified measurement setup, but it exemplifies this problem very clearly.

So now here’s a couple of questions that maybe somebody could answer:

  1. Is anybody aware of any other issues regarding generics and il2cpp?
  2. Are there any other ways of massaging il2pp to get improve generics support?
  3. In case any Unity devs read this - is this something that might be improved in the future?

I am planning on writing an animation library that relies quite heavily on generics, structs and interfaces. If somebody has any additional tips, I would really appreciate it.

Have fun!

I think that you are seeing the advantages of inlining here.

When IL2CPP generates C++ code, any generic methods or methods on generic types are put into a specific group of GenericMethod??.cpp or Generics??.cpp files, respectively (here the ?? represents some number - IL2CPP will generate many .cpp files, each with a different number). Any non-generic code that uses these generic methods is put into a generated .cpp file based on its assembly name, e.g. Assembly-CSharp??.cpp.

By default, the C++ compiler cannot inline method calls between two .cpp files. When you mark a method with the [MethodImpl(MethodImplOptions.AggressiveInlining)], IL2CPP will generate a copy of that method in each .cpp file where it is used. This allows the C++ compiler to inline it.

If you compile with the “Master” build configuration on Windows Standalone, link time optimization will be enabled. This causes the C++ compiler to delay inlining decisions until the linker runs. The linker has knowledge of the entire program, and can optimize across .cpp files. So for best performance at run time, it is usually a good idea to use the “Master” build configuration.

1 Like

@JoshPeterson , thank you for the detailed answer.
By looking at the generated il2cpp code, I was able to infer most of what you wrote, but it is still nice to get an official answer from the Unity team.

Here’s two additional questions:

  1. does IL2CPP support dead code elimination and constant value propagation for generics? I did a couple of tests, and this doesn’t seem to be the case, but maybe my tests were somehow incorrectly done.
  2. I noticed that IL2CPP only reacts to [MethodImpl(MethodImplOptions.AggressiveInlining)], other options like NoInlining don’t seem to work. Is this true, or are my tests simply improper in some way?

No, IL2CPP is not doing much in the way to normal compiler optimizations like this. It is generally translating IL → C++ directly and allowing the C++ compiler to handle optimizations.

IL2CPP does support MethodImplOptions.NoInlining by emitting the associated C++ compiler attribute for the given compiler. But the C++ compiler is not required to adhere to that attribute, so most compilers will ignore it and inline anyway in the Release or Master build configurations.

IL2CPP also supports the MethodImplOptions.NoOptimization option but emitting #pragma directives to tell the C++ compiler to avoid optimizations for the specific function.

1 Like

mark