Vector3 and other structs optimization of operators

This is a thread necro, but for the sake of argument this can be optimized further:

public static Vector3 operator *(in Vector3 a, float d) { ... }

Note the in. It avoids unnecessary copy of a since you aren’t changing it anyway. Good find on avoiding the constructor call, I wouldn’t have thought of that

Be careful with the in keyword because it only works with the immutable (readonly) structs the way you expect it to. In case of an ordinary (mutable) struct, the compiler will force a defensive copy, not only defeating the point of the keyword, but it can also make many functions slower than they need to be.

While we’re here, if you want performant vectors (for specific heavy lifting like procedural meshing etc), make sure to make them 1) readonly, 2) implement IEquatable<> interface (with custom hashing), and 3) optionally introduce better operators, implicit casting to/from legacy types, and/or static interface for ease of use (given they’re now readonly).

1 Like

This was mentioned before, albeit seemingly unacknowledged at the time. I get the feeling one answer would be backwards compatibility and then not being able to get the by-reference overload to get called when both are present, as in the by-value overload looks to be preferred (example).

That’s not much of a concern with mere field accesses as would be used in the specified operator. You’re making it sound like defensive copies are always enforced in such cases on one side of the call or the other, which isn’t the case. Defensive copies really only apply where an operation on something that’s declared read-only is considered to have the potential to mutate the instance, mainly using non-readonly instance properties and non-readonly instance methods. Here’s an example for fun.

3 Likes

Forgot to answer, but this has actually been the case, I’ve tested it thoroughly. The compiler did make defensive copies preemptively if it saw a mutable struct (no further analysis whatsoever), however maybe this behavior changed with years.

I’m definitely glad if it did change, because I was very interested about this and managed to find a full on description why such analysis isn’t as simple as it sounds (I think it was from Lippert himself). Maybe this was in pre-Roslyn times or whatever, but C# certainly retains some design oddities in favor of compiling speed, robustness, or safety, not everything is always logical or reasonable.

Edit:
Anyway, I’ve just read what you’ve said with more attention. Yes, the user-defined contracts matter, but the code isn’t actually analyzed to determine if the struct is actually modified inside the method, that was my point.

My previous comment was intended as a heads-up for anyone who’d make a naive assumption that in is basically like ref, but better because it communicates intent more clearly (which is quite typical because people don’t really expect that this feature is designed like that, it really should “seal” the struct, or at the very least require the in value type to be declared as readonly).

in is not the same as ref, and it’s in fact a rare sight, unless the codebase is designed in such a way to make particular use of it. I’d say ref is still a better option in the context of video games, unless you really want to guard against mutability in some edge cases. (I was also speaking strictly in terms of passing structs as function arguments, which is where this matters the most imo, not about using structs for readonly properties and so on, although that’s a legit argument and a legit use to avoid copying the struct around.)

Here’s an in-depth overview of this feature.

As I said, defensive copies only apply on use of non-readonly instance properties and methods. Readonly instance members were introduced in C# 8 (September 2019, with in parameters being introduced in C# 7.2 in November 2017) with C# 8 becoming available in Unity since 2020.2.

Came across this thread randomly from the FRB github, which I followed from the UAS. I just ran a simple benchmark with the example multiplication and I’m getting 25-35% worse performance with this ‘optimization’, in editor and in build, both for Vector3 and float3, with Mono backend. Am I just stupid? Here’s the example code:

public static Vector3 Mult(Vector3 v, float d)
{
	Vector3 result;
	result.x = v.x * d;
	result.y = v.y * d;
	result.z = v.z * d;
	return result;
}

public static float3 MultF(float3 v, float d)
{
	float3 result;
	result.x = v.x * d;
	result.y = v.y * d;
	result.z = v.z * d;
	return result;
}

void Update()
{
	// vectors
	sw.Restart();
	for (int i = 0; i < count; i++)
	{
		Vector3 v = vectors[i] * 100f;
	}
	sw.Stop();
	// store the time

	// floats
	sw.Restart();
	for (int i = 0; i < count; i++)
	{
		float3 v = floats[i] * 100f;
	}
	sw.Stop();
	// store the time

	// vectors optimized
	sw.Restart();
	for (int i = 0; i < count; i++)
	{
		Vector3 v = Mult(vectors[i], 100f);
	}
	sw.Stop();
	// store the time

	// floats optimized
	sw.Restart();
	for (int i = 0; i < count; i++)
	{
		float3 v = MultF(floats[i], 100f);
	}
	sw.Stop();
	// store the time
}

Sections 1 and 2 perform pretty much the same, and sections 3 and 4 perform pretty much the same. Sections 3 and 4 are ~30% slower than sections 1 and 2.

When run 50000 times, the current Unity’s operator took 18.9 ms to execute, while the optimized one took 2.5 ms.

When run 50k times, the operator takes about 0.42ms, and the function takes about 0.57ms.
When run 500k times, the operator takes about 4.2ms, and the function takes about 5.7ms.

I reused your code on Unity 2022.3.45f1, and the result was good for me. Did you by any chance invert the results? Here is the exact script I used:

using System.Diagnostics;
using System.Runtime.CompilerServices;
using Unity.Mathematics;
using UnityEngine;
using Debug = UnityEngine.Debug;

public class test : MonoBehaviour
{
    private Stopwatch sw = new Stopwatch();
    public int count = 500000;

    public static Vector3 Mult(Vector3 v, float d)
    {
        Vector3 result;
        result.x = v.x * d;
        result.y = v.y * d;
        result.z = v.z * d;
        return result;
    }

    public static float3 MultF(float3 v, float d)
    {
        float3 result;
        result.x = v.x * d;
        result.y = v.y * d;
        result.z = v.z * d;
        return result;
    }

    void Update()
    {
        var vectors = new Vector3[count];
        var floats = new float3[count];

        // vectors
        sw.Restart();
        for (int i = 0; i < count; i++)
        {
            Vector3 v = vectors[i] * 100f;
        }
        sw.Stop();
        Debug.Log($"Vector3: {sw.ElapsedTicks / 10000f} ms");

        // floats
        sw.Restart();
        for (int i = 0; i < count; i++)
        {
            float3 v = floats[i] * 100f;
        }
        sw.Stop();
        Debug.Log($"float3: {sw.ElapsedTicks / 10000f} ms");



        // vectors optimized
        sw.Restart();
        for (int i = 0; i < count; i++)
        {
            Vector3 v = Mult(vectors[i], 100f);
        }
        sw.Stop();
        Debug.Log($"Vector3 optimized: {sw.ElapsedTicks / 10000f} ms");

        // floats optimized
        sw.Restart();
        for (int i = 0; i < count; i++)
        {
            float3 v = MultF(floats[i], 100f);
        }
        sw.Stop();
        Debug.Log($"float3 optimized: {sw.ElapsedTicks / 10000f} ms");
    }
}

And here are the results

How does this script do on your setup?

I’m on 2022.3.28f1. My own script just also has a simple ring buffer for averages.
Here are the results from my script:

I pasted in your exact script just to be safe and the results are pretty much identical:

1 Like

I tested it on the same version (2022.3.28f1) and I have the opposite effect than you:


So the issue seems to come from something other than the Unity version.
Can you tell me what CPU and OS do you use?

Forgot to respond, sorry. I’m on Windows 10 on an i7-6700k.

Inlining may be affecting the results. Try testing with MethodImplOptions.AggressiveInlining enabled.

using System.Runtime.CompilerServices;

    ...

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Vector3 Mult(Vector3 v, float d)
{
    Vector3 result;
    result.x = v.x * d;
    result.y = v.y * d;
    result.z = v.z * d;
    return result;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float3 MultF(float3 v, float d)
{
    float3 result;
    result.x = v.x * d;
    result.y = v.y * d;
    result.z = v.z * d;
    return result;
}

Here are some quick recommendations for profiling to ensure you get representative numbers:

  1. Profile a Build/Player, not Editor code
    Always profile your game in a built Player rather than within the Editor. The performance of code running in the Editor can be misleading.

  2. Use IL2CPP Instead of .NET Scripting Backend
    When creating a Player, opt for IL2CPP rather than the .NET Scripting Backend, as there can be significant performance differences between the two.

  3. Disable Script Debugging
    When building your Player, make sure to disable script debugging, as the debugging features can slow down your code.

  4. Turn Off Development Mode
    Create your Player with Development Mode turned off, since this mode often adds extra error checks that can impact performance. This also disables support for Unity Profiler, so not always applicable.

1 Like

Thank you for these recommendations, with which I agree in the general case. In the case of the optimizations discussed here, I believe that two of these recommendations are too restrictive:

1- Profile a Build/Player, not Editor code
The discussed optimizations have value in edit mode too. In fact I discovered this optimisation while optimizing one of my editor tools. So if for some reason this optimization seems to not work in edit mode in some specific context, I am interested in pinpointing the cause.

2- Use IL2CPP Instead of .NET Scripting Backend
The discussed optimizations are nullified when targetting IL2CPP. The reasons where discussed on this thread 4 years ago (discussion starts at this post). I have not checked if anything changed since.
Regardless, the editor uses Mono, and in some cases IL2CPP is not an option, so optimizing Mono backend still has value.

That is weird, I have a similar setup. Can you please:

  1. Do a Mono build and check if the optimization works for you in the builds.
  2. Send me a minimalistic unity project where the optimization does not work in edit mode.

Thank you.

Valid point. I have not thought about suggesting this test because in my tests it changed nothing, but maybe it will in the case of @TheDemiurge

This was it. AggressiveInlining.

I tested an Il2cpp build just to see, and was surprised to see it perform 20-40 times faster. I’m not sure if this is just getting extremely optimized or just due to how arrays work in C++, or…

Or if it’s just literally optimizing away the contents of the loop because the data isn’t really going anywhere. I added another local variable to just sum with v, and while that more than doubled the times with Mono, it literally didn’t change the times in the next Il2cpp build.
So I added a member variable + function to sum it instead, and that made a big difference:

An interesting thing came about when I accidentally caused a conversion from float3 to Vector3, and it was 4-5x slower from that - far slower than the rest.

So anyway, that’s that. It was just the aggressive inlining.

1 Like