Hi,
While working on some real time mesh generating code, that did hundreds of thousands of Vector3 operations per frame, I was surprised to find that Vector3 (among other Unity structs) operators (*, +, …) can be easily and massively optimized.
The current implementation of the * operator in Vector3 is:
public static Vector3 operator *(Vector3 a, float d)
{
return new Vector3(a.x * d, a.y * d, a.z * d);
}
I run some simple comparison test that you can find here dropb.in - This website is for sale! - dropb Resources and Information., and the result was:
When run 50000 times, the current Unity’s operator took 18.9 ms to execute, while the optimized one took 2.5 ms.
The reason behind this difference is that the optimized version avoids calling unnecessarily the Vector3 constructor.
Interesting, I never thought about checking the performance of unitys math library stuff.
I just tried out the operator you mentioned, and I do indeed get better performance when I roll my own struct and implement the operator as you suggest.
Unitys Vector3 seems to take 30% more time for me, both inside the editor and inside a build (on 5.6.1f1).
Interesting. Considering that the current implementation is less verbose, and all that the constructor does is assign the corresponding fields, I wonder why the C# compiler doesn’t optimize this?
Have you tested this in standalone release mode? Maybe it only affects debug mode?
When you tested it in standalone did you disable development build/script debugging?
I wonder the same thing. It seems to me to be something the compiler can handle, but I suppose things are more complicated than what I imagine.
I confirm the optimization works in those conditions as well. I used a heavier version of the script above, and used Fraps to get the FPS count (to exclude any Unity’s profiler possible issue), and here are the results:
Wow I didn’t realize that running the constructor in this case would make that big of difference. (I’m self taught BTW) but thanks for this insight. I will keep this in mind when developing my stuff. Thanks for the info!
yep, a constructor function is just that… a function.
So it allocates a stack frame to call it.
If you don’t call the constructor though, it just allocates the memory needed for the struct with empty values.
This is why struct’s don’t allow field initializers, they MUST be empty values. Where as classes always have a constructor phase, so it doesn’t have this restriction.
…
I find this a minor optimization, probably resulting from early Unity. I bet it came about because the unity devs were all C++ programmers first and foremost, and so didn’t really consider the inner workings of the mono CLR. But it is a area of optimization that could potentially give a little oomph since vector construction is very common.
Yeah I knew that the constructor is a glorified method basically but that sheds some light on how C# allocates it’s frame steps so thank you for that info!
Yup, no development build here. I measured the times using the .NET Stopwatch class.
One would think that the compilers (either the C# one or the JIT) should be able to inline this constructor call, but apparently they just don’t.
Considering that working with Vector3s and other math structs is quite common in many games, optimizing these operators would provide a nice benefit, and it doesn’t even look like a lot of work ^^
The object initializer method does also avoid the call to the constructor, but it still has two extra instructions, the important one being an initobj call, which is going to cause a bit of extra work to be done in the form of it initializing all the values of the struct to zero or null. So while that should still be a lot better than the call to the constructor, the local declaration and assignment still wins out.
I’m really surprised the CLR doesn’t optimize this initobj call out if it detects you’re assigning to every value in the struct.
When working with huge amounts of Vectors, it can be benificial to avoid them at all. Just define three floats, its still a new object that is being created.
I wonder why they cannot just fix the aforementioned request rather than create a whole new library that you have to know to get and integrate? I appreciate that a release of Unity (which is presumably what would be required) is no small matter. However, this does seem to be such a fundamental part of a 3D platform to reasonably have expectations of an efficient implementation.
But then maybe I am simply missing something here.
You are missing something
That mathematics library isn’t the “solution” to this tiny little problem here, it’s completely unrelated to it. That mathematics library is designed to help ensure highly efficient compilation of your complex vector/matrix/etc… math in general, helping it be tightly packed and memory efficient in the burst compiler.
That mathematics library will be integrated in Unity… It’s just that it’s quite beta right now so people who want to mess with it right now can do so through the repository and also help find bugs or contribute improvements (at some point potentially).
Ah, ok. I’ve got my wires crossed. That means my vote for improved struct performance may not have been wasted then - assuming that ever gets looked at.
I rewrote the IL of some Unity’s DLLs and measured performance of a few applications. My conclusion was that Unity Technologies can achieve quite some performance improvements, with very little work, with trivial changes only, without actually changing something in user-code.
Yes, they do provide a new math lib, but to make use of it, you need to change your project. This probably give better performance, but it might also not be a trivial change. Therefore, if Unity would just change some simple code in their Vector classes, every existing Unity project would actually benefit from those changes automagically.
The new Unity mathematics library has definitely its benefits, that are higher than the optimizations this forum thread is about. But using that library means you have to modify/rewrite parts of your code. The Vector3 (and similar) optimization works with 0 modification on your code.
What kills me the most is to know that this optimization should hardly take more than a man/day to Unity’s developers to implement, which is peanuts knowing the increase of performance it creates (Peter77 spoke here about a 4% increase in his game). Knowing that people in Unity are aware of the existence of this optimization (Suggestion ticket + me writing to them), the most probable explanation I see is that the internal organization of the Unity company became so complicated that making such simple useful modifications became a daunting task.