Vector3: I don’t blame the compiler or Unity. Doing a method call is slower than not doing one regardless of the engine, language, or compiler. Regardless of the case, the CPU has to do more work because there is a function call, so it’s best to skip it where you can. Adding a couple numbers together in variables directly in code is always going to be faster than calling some function that does the same thing. It’d still be slower whether or not you did that as an “out” or there was a “new” in there. A function call takes time regardless.
I went to fairly absurd levels in the boat simulator, but it was worth it. I think in terms of groups of identical computations that need to be done and then do them all in a single function call, preferably on memory that is packed together in a single array. I have several functions that compute different types of forces (hydrodynamic, aerodynamic, skin friction, buoyancy, and whatever), but they do it on the entire batch of triangles at one time.
First I might compute all the buoyancy forces on all triangles, then all the hydrodynamic forces, etc… Each type of force is a relatively small function so as it’s cycling through those triangles in a simple loop, the CPU cache can be taken advantage of. Often times that alone can significantly speed up your code. So instead of calling something like ComputeHydroForce() on a single triangle, then calling ComputeBuoyancyForce() on that triangle, etc., I do ComputeHydroForces() (note the “s” at the end) one time which computes the forces for all of the triangles, then do ComputeHydroForces() (again with an “s” at the end) one time which computes that force for all the triangles, then the next force, etc… The CPU cache likes that because it’s not jumping in and out of different functions as it visits each triangle, and I do one function call instead of thousands.
Internally each of those functions returns an array of positions and forces and tacks on the number of forces. The force/position arrays are static size and set to the maximum possible number of forces so there is not a “new” every frame.
(There is a maximum of one force of each kind per triangle in the mesh, and each triangle can be split into 3 triangles at the waterline at run time, so I just set this to the number of triangles in the mesh * 3. Easy. I keep track of the current number of triangles somewhere too so I’m not processing triangles that don’t currently exist.)
Internally all the computations are expanded so there aren’t any function calls at all for any of the triangles which is super fast. At the end of that I have an array with position/force vectors for the hydrodynamic forces, another array for the aerodynamic forces, etc…
Normally at this point you’d zip through them all and call rigidBody.AddForceAtPosition on every force, but again you have a seperate function call there for every triangle which is slow regardless of the engine or language you’re using, or anything else. I have thousands of forces acting on each boat, so it’s important to not have thousands of function calls too.
So instead of doing that, I compute the torques from the forces and force application positions myself, accumulate them all as I go, then do a single AddForce and AddTorque at the end. So if I have 10,000 forces I still only have 1 AddForce and 1 AddTorque call instead of 10,000 AddForceAtPosition calls.
Here’s the function. This is called once per force type (hydro, aero, etc.) per rigid body per physics cycle:
void AddForcesToRigidBody(ForceAndPosition force, Rigidbody inRigidBody)
{
Vector3 totalTorque;
Vector3 totalForce;
Vector3 positionRelativeToCenterOfMass;
//Vector3 positionCenterOfMass = inRigidBody.worldCenterOfMass; //This is now computed in FixedUpdate().
totalTorque.x = 0;
totalTorque.y = 0;
totalTorque.z = 0;
totalForce.x = 0;
totalForce.y = 0;
totalForce.z = 0;
for (int i = 0; i < force.numberOfForces; i++)
{
positionRelativeToCenterOfMass.x = force.position[i].x - positionCenterOfMass.x;
positionRelativeToCenterOfMass.y = force.position[i].y - positionCenterOfMass.y;
positionRelativeToCenterOfMass.z = force.position[i].z - positionCenterOfMass.z;
totalTorque.x += positionRelativeToCenterOfMass.y * force.force[i].z -
positionRelativeToCenterOfMass.z * force.force[i].y;
totalTorque.y += positionRelativeToCenterOfMass.z * force.force[i].x -
positionRelativeToCenterOfMass.x * force.force[i].z;
totalTorque.z += positionRelativeToCenterOfMass.x * force.force[i].y -
positionRelativeToCenterOfMass.y * force.force[i].x;
totalForce.x += force.force[i].x;
totalForce.y += force.force[i].y;
totalForce.z += force.force[i].z;
}
inRigidBody.AddForce(totalForce);
inRigidBody.AddTorque(totalTorque);
//Above code is 0.02 versus 0.24 ms on deep profile. Better than 10 times faster.
//for (int i = 0; i < numberOfForces; i++)
// inRigidBody.AddForceAtPosition(force[i].force, force[i].position);
}
Internally there I’m doing all the same calculations that AddForceAtPosition probably does, but I get rid of thousands of function calls per frame. It’s not that Unity is slow, it’s just the overhead of the function call itself. And perhaps most importantly, there are not any function calls at all, whether they’re Unity functions or my own functions, inside of that loop that get called once per triangle. There are no “if” statements either. Just math, and since this is all done on arrays of forces and positions next to each other in memory, it’s likely that the data is all sitting in the CPU cache instead of being retrieved from the very slow RAM every time the loop iterates. That alone can speed up the code significantly. Even the transforms are cached in the beginning before the loop.
I also get rid of TransformPoint and TransformDirection in a similar way by computing them all manually and stuffing the results into an array. For example, I have to rotate/translate all my physics triangles in the boats from local to world space every physics frame. Instead of calling the Unity functions for every single triangle one at a time, I do it manually all in one shot in a single function call. Since it’s sent in as an array of positions, it’s all packed together in memory too which is good for CPU caching. This runs far faster than calling TransformPoint on every single thing you want to transform one at a time.
If the code below replaced “right” with “transform.right”, it would be monstrously slower, because transform.right is accessing a property. I.e., it’s basically a function call rather than a number just sitting in memory. Same goes for transform.up and transform.forward of course.
void TransformPhysicsTrianglesFromLocalToWorldPosition(HydroTriangle triangle, Vector3[] fromPoint, Vector3[] toPoint)
{
Vector3 rotatedPositionLocal;
Vector3 right = transformRight;// cached value of transform.right;
Vector3 up = transformUp;// cached value of transform.up;
Vector3 forward = transformForward;// cached value of transform.forward;
Vector3 position = transformPositionFixedUpdate;// cached value of transform.position;
for (int i = 0; i < triangle.numberOfTriangles; i++)
{
rotatedPositionLocal.x = fromPoint[i].x * right.x +
fromPoint[i].y * up.x +
fromPoint[i].z * forward.x;
rotatedPositionLocal.y = fromPoint[i].x * right.y +
fromPoint[i].y * up.y +
fromPoint[i].z * forward.y;
rotatedPositionLocal.z = fromPoint[i].x * right.z +
fromPoint[i].y * up.z +
fromPoint[i].z * forward.z;
toPoint[i].x = rotatedPositionLocal.x + position.x;
toPoint[i].y = rotatedPositionLocal.y + position.y;
toPoint[i].z = rotatedPositionLocal.z + position.z;
}
}
In this way I can just hand it an array of whatever type of points I want transformed and it will stick them in some other array, or even back into the original array since the write to “toPoint” is done at the very end. I do everything this way now and have seen rather astonishing speed increases. The key is to think of doing one function call per frame in a big loop instead of thousands of function calls. Again, it’s just because of the overhead of the function call itself probably. Even if the function does nothing at all it still takes time to call it and return nothing. The CPU and RAM are doing things like setting up and popping the stack and so forth every time a function is called, whether it’s a Vector3 or RigidBody or Transform function or one of my own functions. It’s not a Unity specific thing.
If you want to post some of your particle code here or send some of it to me privately, I’d be happy to take a look and suggest ways in which it could be sped up. I’ll keep it confidential. I’m not in the business of writing particle systems and don’t ever plan to be. There may be huge gains that can be made, or there might be none. Depends what you’re doing with the particles. 