Hi there,
I’m posting this since it may be useful to someone else and also as a way of verifying my conclusions. This is in no way meant as a C is better or C# is better, it’s just data that I used to make a decision.
Based on the findings below, I decided to write a wrapper to the Chipmunk physics library, rather than port it to C# (the summary is that the calling overhead to C/Objective C is negligible, particularly when compared to the performance of C#, which is this case is two orders of magnitude below C).
I also found out that Unity isn’t using P/Invoke, or if it is, then its performance when compared to mono’s internal call is very similar.
So, here’s the code I used.
C#:
using UnityEngine;
using System.Collections;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System;
public class Hello
{
static int val=0;
[MethodImplAttribute(MethodImplOptions.InternalCall)]
public extern static int RetInt();
[DllImport ("__Internal")]
public extern static int RetIntSlow();
[DllImport ("__Internal")]
public extern static void TestIntC(int millions);
public static int RetSqrtCSharp()
{
val++;
return (int)Mathf.Sqrt(val);
}
}
public class InteropCallDemo : MonoBehaviour
{
delegate int IntegerRet();
//Quick method to call an int return method some million times
void CallMillionTimes(int numMillions,IntegerRet method)
{
DateTime time=System.DateTime.Now;
int total=0;
for (int n=0;n<numMillions*1000000;n++)
{
total+=method();
}
print("Total="+total); //Using total, so the compiler doesn't optimize and gets rid of the code above.
DateTime now=System.DateTime.Now;
System.TimeSpan diff=now.Subtract(time);
print("Spent "+diff.TotalMilliseconds+" ms");
}
// Use this for initialization
void Start ()
{
//Number of calls made to benchmark functions
int numMillion=100;
//Test mono_internal_add_call method
print("intFast performance for "+numMillion+" million calls");
CallMillionTimes(numMillion,Hello.RetInt);
print("intSlow performance for "+numMillion+" million calls");
CallMillionTimes(numMillion,Hello.RetIntSlow);
print("Performance in C for a method that sqroots "+numMillion*100+" million times.");
Hello.TestIntC(1);
print("\nPerformance in C# for a method that sqroots "+numMillion+" million times.");
CallMillionTimes(numMillion,Hello.RetSqrtCSharp);
}
}
C:
extern "C" {
int val=0;
static int RetInt()
{
val++;
return val;
}
int RetIntSlow()
{
val++;
return val;
}
int RetSqrtC()
{
val++;
return sqrt(val);
}
int diff_ms(timeval t1, timeval t2)
{
return (((t1.tv_sec - t2.tv_sec) * 1000000) +
(t1.tv_usec - t2.tv_usec))/1000;
}
void TestIntC(int numMillions)
{
timeval start, finish;
gettimeofday(&start,NULL);
int total=0;
for (long count = 0; count <numMillions*1000000; count++)
{
for (long count2 = 0; count2 <100; count2++)
{
total+=RetSqrtC();
}
}
gettimeofday(&finish,NULL);
printf("\nTotal %d",total); //Using total, so the compiler doesn't optimize and gets rid of the code above.
printf("\nSpent %d ms",diff_ms(finish, start));
}
};
void mono_internal_initialize()
{
mono_add_internal_call("Hello::RetInt", (void*)RetInt);
}
The result:
intFast performance for 100 million calls
Spent 18765.762 ms
intSlow performance for 100 million calls
Spent 19340.145 ms
Performance in C for a method that sqroots 10000 million times.
Spent 15210 ms
Performance in C# for a method that sqroots 100 million times.
Spent 77821.272 ms
The times above are for an iPhone 4, with a release version.
So, 100 million calls to a C function took 19 seconds, which makes the calls negligible in cost, since the cost per call is around 0.00019 ms. This means that even if I called some C method 1000 times per frame, it would only take 0.19 ms.
To make things even better, I’m using delegates here, which are most likely introducing a relevant overhead (haven’t checked the CIL code or timed it).
It was interesting to find that mono_add_internal_call is basically the same as the DllImport method, which I was assuming used P/Invoke.
The last two times are a bit more indicative on the why I’m choosing to write a wrapper for Chipmunk. C# is two orders of magnitude slower in a simple sqroot method. I was actually surprised by this. So surprised that I had to tweak the C code to do 100 times more calculations (when it was the same as C# it took near 0 seconds).
Now I’m going to find out about the performance of UnitySendMessage versus unmanaged to managed thunks. Has anyone done any testing here?
Chipmunk has a lot of callbacks and I need to make sure there isn’t a bottleneck in UnitySendMessage.
Feel free to chime in, particularly if you disagree with any of my conclusions.
Cheers,
Alex