Built in Math still slow! (faster implementations here)

Some of you may know that a while ago, another user ran some tests and found out that some math functions (primarily the Lerp functions) were slower than needed. He could write faster implementations which did the same thing as the built in ones. This resulted in Unity updating the built in functions to run faster.

I have run some tests to see if I could write faster implementations of some other Mathf functions, and it turned out that I could.

Abs (float) - 86% faster
Abs (int) - 105% faster
Sign (int) - 60% faster
Clamp01 (int) - 63% faster
Repeat - 291% faster
Approximately - 5209% faster!

Especially the Approximately function could get super fast! It seems like it has become slower from 2.6, because I ran the same test script on 2.6 and then my implementation was ā€œonlyā€ something like 250% faster.
It almost seems like I must have written faster, but incorrect code.

It looks like mostly the functions operating with int’s are a lot slower than they could be.

Here’s a webplayer if you want to see for yourself - Mathf Performance Test – Arongranberg.com
The script executes the functions 50000 times and measures the time it took for them to execute, then it compares it to my implementation and calculates a % value (100% equals no difference).
All values are random (refreshed every loop) and goes from -100000 to 100000… I think.

I have attached the script if anyone wants to take a look (had to attach it as .txt because the uploader didn’t want to upload a .cs file).

419314–14545–$MathfTest.txt (8.02 KB)

abs, sign, clamp (int). approximately and repeat are faster here, the rest are slower.

Sturestone, can you test it on other hardware?

I remember that 2-3 months ago I was trying to optimize a low level C algorithm and I tried two different optimizations: one of them was much faster on my current desktop and the other one was faster in my 5-years-old laptop. So it is possible that your optimizations are faster in your computer but slower in other computers.

I have tested it on the two computers I have available, one old iMac and one new iMac, the implementations I wrote about in the first post were faster on both.

I posted a link to a webplayer so you can test it yourself, post your results!

I’m at my Ubuntu desktop at this moment, so I can’t check it now (no webplayer for Linux :(). I’ll try it later from my MacBook Pro :wink:

Here are my results, a bit of a mixed bag of improvements and losses:

I’m on:
Windows XP Service Pack 3, 32bit
Intel Core 2 E8400 (3.0 ghz)

Yup, I’m glad not all of my implementations were faster, then Unity would have done a really bad job with Mathf.

Well that’s depressing to hear, especially for something like max() on an integer. I’d guess/hope there’ll be reasons for this beyond poor implementation.

I think your clamp methods can be faster if you put a single line like:

return a > c ? c : a < b ? b : a;

(similarly for Clamp01)

You avoid the re-assignment of ā€œaā€ and the extra checks if ā€œaā€ is greater than ā€œcā€. Although, I don’t know the current Unity behaviour if the user supplies garbage (min greater than max).

And the Lerp method could probably be made faster if the Mathf.Clamp call was skipped and its content copy/pasted into the method (avoid invoking the nested method), but that might not be the case unless the custom Clamp logic can be made equal/faster than the built-in Mathf.

Thanks FizixMan

This is what I am getting now.
My implementations seems to be at least as fast as the built-in ones now.

Updated the webplayer.

419443--14548--$SkƤrmavbild 2010-11-01 kl. 17.25.40.png

Hey! Thanks for looking into this. :slight_smile:

Some comments:

Abs (float)
Abs (int)

Unity calls Mono’s Math.Abs internally, so there is added method calling overhead. This is silly, so tomorrow I’m going to implement it so that we don’t waste time by calling into Mono.

Sign (float)
Clamp01 (float)

Unity’s implementation is faster because you are doing comparison of floats vs ints. And then casting clamped results from int to float.

Sign (int)
Clamp01 (int)

Your implementation is faster because we don’t have overrides for ints so there are implicit casts. I’m going to add overrides tomorrow.

Clamp (float)
Clamp (int)
Clamp01 (float)
Clamp01 (int)

All of your clamps are doing an unnecessary ternary operation if the value is greater than the max.

Min (float)
Min (int)
Max (float)
Max (int)
Lerp (float)

These are identical to our implementations. And your test shows approx 100% on these, so that makes sense.

Approximately (float)

Your implementation degrades at larger values. For example, if both a and b are very big, but they are different by one bit, they’d logically be approximately the same. Your implementation would likely return false since adding/subtracting such a small constant would not be representable in such a low floating point resolution.

Also, it’s very dangerous for us to change Approximately, since it may break backwards compatibility.

Repeat (float)

Your implementation does not work with negative values.

As long as you’re improving math functions, can you take a look at SpeedLerp? Mathf.Lerp, Mathf.InverseLerp, Mathf.SmoothStep, Vector2.Lerp, Vector3.Lerp, Vector4.Lerp, and Color.Lerp are all consistently slower than my own implementations across G5, Xeon, and ARM processors. It would be nice if you could make that script obsolete. :wink:

–Eric

Mind sharing your script?

InverseLerp and SmoothStep are bit more involved so I’ll need to spend some time looking at our implementation, but quickly looking over our Lerps I’m not sure how they can be sped up. I’m curious what you’re doing to make them faster.

1 Like

Pretty great to see Unity dev team interested in the little optimisations too :slight_smile: keep it up!

Yep, I linked to it in my previous message (SpeedLerp).

–Eric

Oh, I didn’t notice the underline. Maybe we should make that a bit more readable :slight_smile:

Looks like the only difference between the Lerps is that your implementation early outs when the value is outside of 0-1 range (instead of clamping) which is fair enough, but the worst case scenarios seem to be identical.

i5 750

419739--14562--$speed.jpg

Ah. I knew it was to fast to be correct :wink:
Just out of curiosity, how does your implementation work?

And btw, why is U3s implementation of Approximately so much slower than U2.6s implementation?
When running the test on U3 I get 5000%, but on 2.6 I get around 300%, it might be my implementation which magically got faster on U3 though.

Ah, yeah, missed that.
ā‰ˆ200% faster is quite a lot though, I shall see if I can get it to work faster than yours and still get it to work with negative values.

…

Okay, it seems like I can’t find a faster implementation of Repeat.
Here’s what I got, it’s running at about 90-100% on Unity 2.6 (haven’t tested it on U3).

public static float Repeat (float a,float b) {	
	if (a < 0F) {
		return b+(a % b);
	}
	return a % b;
}

It doesn’t return exactly the same values as the built-in though, when sending -20, 5, Mathf returns 0, but my implementation returns 5, both are correct though (0 <= 5 <= 5).

I would also comment on the documentation regarding Repeat, nothing big but anyway.

It says:

Doesn’t the modulo operator work with floating point numbers?
I am using it in my implementation, it works great.

[Edit] Never mind about this, this is not the right place to comment on it