IL2CPP super SLOW vs Mono Standalone

Hello guys, i working on a WebRTC in full c#.

In last week i finally finished my H264 Codec Decoder (100% C#) but i facing a super strange problem when building the project with IL2CPP.

In mono Build i can archive around 30~40fps while decoding a 1920x1020 H264 Stream, but when i swap to IL2CPP i can only reach half of this (around 15~20fps).

I tried everything in project settings options (Master C++ Compiler Configuration, Strip Engine Code, etc) but seen that IL2CPP can’t handle well the heavy CPU byte[ ] operations.

Here my results after trying everything:

//Tested in Unity 2019.4.1f1 with 8 threads in intel i7 6700HQ
--------------------------------------
| MONO          | 38,42 frames/sec   |
--------------------------------------
| IL2CPP        | 19,55 frames/sec   |
--------------------------------------

Maybe someone has a tip to improve perfomance in IL2CPP? Is mono really much better optimized to heavy byte[ ] operations?

PS: I Already tried to replace byte[ ] to unsafe byte* calls, or optimize for(int i=0; i<array.Length; i++) to for(int i=0; i<n; i++) but no improvements noticed in final decoding speed.

EDIT:
Well i already made another benchmark with a simple loop with 600000000 iterations
and seems that IL2CPP is super slow with generics. The weird thing is that i just use byte[ ] and MemoryStreams in H242 operations… maybe related to memory stream, or Array.Copy / Buffer.BlockCopy?

2 Likes

Can you share this project via a bug report? We would love to have a look at the behavior and improve the performance with IL2CPP here.

Thanks @JoshPeterson
I already submited a bug report today.

Thanks! Can you let me know the bug report number?

@JoshPeterson number 1264028

Another tip from my investigations releaved that one generic method that simplificate BlockCopy perform a huge overhead in IL2CPP

ArrayHelper.BlockCopy

Maybe generics dont perform well in IL2CPP?

EDIT:

Okey, finally found that all the problems was related to functions with IList as parameter.
Changing IList to T[ ] i finally fixed the performance problems in IL2CPP (and now is running faster than Mono)

EDIT 2: Net Core seems to keep running twice faster than final build in IL2CPP, is that correct? RyuJIT compiler seem to be much faster than unity.

EDIT 3:

Unity Editor runs 20 times slower than IL2CPP Build and my code is useless inside editor

1 Like

I think IList looping isn’t generics being slow, it’s interfaces.
A method like:
void Loop (IList list) { }
will pass an interface and use virtual calls it can’t inline etc, unless you’re lucky and the entire method gets inlined away

I’d be interested to see how it performs if you adjust it to
void Loop<T, U>(U list) where U : IList { }
as I believe that’ll instantiate functions for every list type you pass in, getting rid of the virtual calls (unless your U is an interface? idk if that works) and making inlining etc easier for the compiler

Nice tip, let-me try your suggestion

Thanks for the updates. We will investigate this.

Regarding the editor performance - what version of Unity are you using? You may need to ensure that debug code generation is disabled. In versions of Unity prior to 2020.1 (I think that version is correct), code generation is debug by default, and can be changed in the editor preferences. Later versions of Unity switched to release code generation in the editor by default.

Thanks for reply @JoshPeterson , i using unity 2019.4

Where can i find this option? i never saw this property in editor preferences. I tryed to find this “debug code generation” option but i failed to find it.

You can find it in the Edit > Preferences > External Tools dialog (on Windows, on macOS I believe it is Unity > Preferences > External Tools). You want to disable the “Editor Attaching” option.

You will need to restart the editor for this change to take effect. Once you do that, the JIT will emit release code, but you won’t be able to attach the managed debugger to the editor until you enable this option and restart the editor again.

In 2020.1, this workflow is much better - you get release code generation by default. Then when a debugger is attached, the editor will prompt you to switch to debug code generation. It can do this without restarting.

1 Like

Thanks @JoshPeterson i will upgrade to Untiy 2020 to access this better workflow. Thank you for your time.

@Zuntatos i tested your approuch but it seems to run ultra slow too.

The problem is with IList … even if i pass parameter as U with U : IList the methods used will continue be from IList interface, even if i pass, for example, byte[ ]. The performance is the same as using simple IList as parameter

1 Like

Issue track accepted by unity

2 Likes

I wanted to follow up on this issue, as I’ve just had a chance to investigate it. As @Zuntatos mentioned, the issue here is not generics, but is instead the interface calls. IL2CPP has a worse algorithm (in terms of run time performance) then Mono for interface method calls. So I would recommend avoiding tight loops which do interface methods calls if at all possible.

6 Likes

Its great that you got a response back to us, but this seems like a pretty painful limitation (its easy to work around, but the workaround is likely at the expense of the reusability of your code).

Is there any idea of if/when this might change?

3 Likes

When can we expect that to be fixed? Having to avoid interfaces obviously is a major issue, some of us have client projects that are already live and at scale that use interfaces extensively as there was no information beforehand to say otherwise. Rewriting these would not be a small task, and would be at a serious cost to our business not only financially but in terms of time.

5 Likes

I don’t have an ETA for this work, sorry. As with anything else, we need to weigh the cost of improving this against other priorities. If you have specific cases where this is causing a performance problem, we would love to have a look at them. If we have enough data, that may raise the priority of doing this work, so I would recommend profiling code that you think might be impacted by IL2CPP’s interface method invocation algorithm.

Microbenchmarks like the one discussed here are important for testing and improving specific performance issues. But we’ve found that benchmarks like this are not indicative of whole program performance. While interface method calls are slow, they usually make up a very small fraction of the overall time spent during program execution. So while improving them would be positive, there are often other changes that can provide more benefit.

But again, data is the best way to inform decisions like this, so I’d love to see any profiling information. Thanks!

8 Likes

Thanks, unfortunately I dont think I have the rights to do that with our codebase but I will talk to management and see if we can get some sort of reports together for you so that you have some data. Thanks for getting back to us!

4 Likes

I also want to emphasize that interfaces are fairly important for us - and sometimes they’re unavoidable. It’s good to know that they can potentially cause performance issues, so we will take a closer look at the parts of our code which use them extensively (e.g. our networking implementation).

1 Like

I don’t get why issue like that is marked as “By Design”. It makes no sense, users should be able to upvote it.

3 Likes

He explains what you can do, to give this issue more priority, in this post:
https://discussions.unity.com/t/800338/16