How much faster is evaluating an animation curve opposed to calculating a relatively long formula?

In cars, the front wheels follow what’s called “Ackerman Steering Geometry”.
I am trying to find a way to implement this accurately in Unity, and the formula is full of multiplications, divisions and arctan calculations.
Considering that there are two such calculations required for one car, if there are 10 unique cars, in the level all making the same calculations, that’ll cost a lotta performance over time.
The following is an example of the formula. It’s pretty long.

I’ve been thinking of ways to optimize this, and two solutions come to mind:

  • Just directly use the formula to calculate the angle of the steering wheels every frame.

  • This is the most straightforward way to do it, and I fear it might be unnecessarily slow. A bunch of nested calculations, two multiplications, two divisions and an atan calculation.

  • But… could it be fast? From what I can tell, Animation Curves has a bunch of fancy Bezier curve calculations and chances are that evaluating the curve may be slower than just calculating the values directly.

  • During startup, generate an Animation Curve by getting several values from steerInput [-1, 1] and evaluate that Animation Curve to get the angle of the steering wheels.

  • From what I can tell, animation curves are relatively cheap, and if I am able to get a large payoff from doing this, I’d be pretty happy!

  • If this is truly more efficient, I do wonder: What number of keys would be the best balance between accuracy and performance?

  • I also wonder: Should I make one static curve for each car make that every instance of the same make evaluates from, as opposed to a curve generated for every single car? I wonder how much RAM space a curve can occupy.

Of these options, which do you think is the most efficient in terms of performance? Or do you think the performance benefit of the latter is negligible?

AnimationCurves should have a O(logn) evaluation time, where n is the number of keys, but you need to test this assumption.

The way I would code it would take interpolant and find the keys left and right of it. For this to work optimally, I’d make sure that the position of keys were already normalized in the memory and sorted beforehand. This means you need at worst a binary search algorithm which is O(logn).

However when you use this on a hot path, the evaluation of the curves is probably more expensive than the formula you’ve shown. After finding the keys, a new spline segment interpolant is computed, only then the curve can be finally evaluated.

“two multiplications, two divisions and an atan calculation” is actually brutally fast. I don’t know what you mean exactly by “nested calculations” but if you want to streamline it further, I’d recommend multiplying with cached reciprocals instead of dividing (if at all possible) + a look-up table (LUT) for Atan which lets you introduce a custom trade-off between precision, memory, and performance.

Regardless of what you do, the best way to be sure is to benchmark both solutions and establish a reliable performance metric. Do not assume. My hunch, for example, is completely opposite from yours, AnimationCurves aren’t exactly cheap and involve just as many (or more) divisions and powers.

1 Like

Man, I’ve been thinking about this so hard that it never occured to me that I just… need to calcuate it once. The left values to the left of steeringInput never do change during gameplay, so all I need is to calculate the value once and store it in a variable!
Chances are that there’s no reason to optimize this, unless loading times take too long. But if it as “brutally fast” as you describe, optimizing this may not even have an impact.

Not long ago I finished a programming class that focused strongly on optimizing code, emphasizing multiplications, divisions and trig expressions as being slow. I was just so affixed on trying to speed up the calculation for runtime that I never stopped to think if I could avoid repeating the calculation at all.

Thanks for your reply though! I probably wouldn’t have ever realized. Still, I will take your advice for future projects.

By cached reciprocals, you mean something like multiplying by 0.5 instead of dividing by 2, right? Also, Look-Up-Tables sound pretty useful - this was the kind of data structure I was picturing in my head when considering the animation curve. I’ll see if I can generate a lookup table instead of an animation curve if it becomes necessary in the future.

Yes. But please consider that doing this is only really necessary for ultra-optimized calculations when your intentions are to run things as tightly as possibly in a critical path. Given that abstract languages provide no guarantees that low-level compilation will be truly optimal, you have to sometimes ‘nudge’ the compiler in the right direction. That said, please do not become superstitious from my advice as compilers will likely already replace your divisions by 2.0f with a corresponding multiplication by 0.5f. So take it all as a technical nudge in extreme cases, not as an advice for a regular routine, and never ever compromise the clarity of your code.

And obviously test your assumptions. Maybe doing a LUT would be slower for your case, due to some unfathomable cache misses or memory sync issues? Who knows. It only works in theory, you need to benchmark it anyway.

Talking about nudging the compiler, you may also check the method aggressive inlining feature in System.Runtime.CompilerServices library, which may (under unspecified circumstances, but frequently enough) optimize code compilation to reduce function calls, yet another way to optimize the size of the stack frame in a critical path. This is especially useful for hinting static libraries and low-level math.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void MyFrequentlyCalledFunction() {
  // ...
}

However, I believe that expression trees enjoy this kind of optimization anyway, so you might want to adjust your coding style to get the most out of it anyway.

public int trueIndex(int index, bool appending = false)
  => (_start + (_rev? -index : index)).Mod(Points + (appending? 1 : 0));

// performs (and reads) significantly better than
public int trueIndex(int index, bool appending = false) {
  if(_rev) index = _start - index; else index = _start + index;
  var p = Points;
  if(appending) p++;
  return index.Mod(p);
}

Here’s an example of a true modulo

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public int Mod(this int n, int m) => (m < 1)? 0 : (n %= m) < 0? n + m : n;

Also don’t fall into trap of thinking that branching hurts your performance. Caching and prediction are a thing.

This branchless variant will consistently perform nearly two times slower (even though it uses integer division instead of %)

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public int Mod(this int n, int m) {
  n -= m * (n / m);
  n += m & (n >> 31);
  return n;
}
1 Like