Particle Simulator: too much to compute per frame

Hello!
I’m working on a particle simulator program. I’d like to use the Unity engine rigid body mechanics for collisions; though I will compute any other forces myself, during fixedUpdate.

My problem occurs when I add more particles to the simulator than my computer can compute forces for, and still animate smoothly at runtime.

I would like to solve this as follows: rather than displaying the particles onscreen as they are simulated, I would like to RECORD the simulation, for playback later. During playback we won’t need to do any force or collision computations, just put each particle in its recorded position for each frame, so it should be smooth.

I have no idea how to ensure the Time.deltaTime between FixedUpdate’s is always the same, regardless of how long it actually takes to process. My initial tests showed this value does not change regardless of how large the sim gets, but these results are also limited (and I guess, rendered irrelevant), by the fact that the program and unity will completely freeze up if the sim is TOO large.

In my old, non-unity, version of this program (with poor collisions) I had to put the sim physics processing in a completely separate thread from the UI, in order to prevent freeze-ups. If a separate thread wouldn’t work in unity, I’d be willing to rig up pauses in the processing, for UI. But I suspect both these methods will mess with the unity collision physics, right? Perhaps there is some way I can compute the collisions on demand, rather than within the standard physics engine?

All ideas and suggestions are appreciated.

I’m not sure how to save things, but if you can find out how to save a file perhaps you could just write the particle states during a record session into an array and write them to disk?

One of the tricks I used in VRC Pro’s particle system (not Unity, but it should still apply here if you’re updating the particles yourself) was to not update every particle’s state every frame. In VRC the physics time step was very small, around 1/250th of a second depending on the car that was being done (1:12 scale cars were 1/500th if I recall correctly), so updating all the particles that quickly looked amazing, but after a few thousand particles it started to affect things too much.

What I did was batch the particles into a hard coded number of groups (5 to 8 I think, but would have to check the code again to be sure). Every physics cycle, it only processed one of the groups. I.e., it would compute forces and update positions/velocities, etc., on group 1 on one frame. The next physics frame it would update group 2. The next physics frame would do group 3, etc… As a result I was able to do several times more particles than trying to do them every frame. I just played with the number of groups until it looked ok. The performance in the particle processing was increased several hundred percent this way.

If the particles don’t update visually enough for you and your physics per particle is fairly intensive, you could try updating the positions of all particles every frame, and only update the velocities/accelerations and so forth in groups as described. You could probably do enough particles this way where the bottleneck becomes rendering instead of computing the states. In that case there may not be a need to record the states and do a simple playback at all.

Thanks for the post Todd. An interesting idea that I too have used previously, to optimize performance. My concern with this method is that I won’t be computing the changes in forces (due to the changes in position that we ARE updating every frame) often enough to ensure accuracy.
Even if the inaccuracy is small for X particles, since we continue to use the same time rate (so collisions work) regardless of the number of computations we need to do, this inaccuracy would need to grow with the number of particles simulated (or start slowing down again).

If I could simply use the collisions on demand, rather than automatically with the physics engine, I think that would solve the issue. Then I could scale the physics time completely separate from the UI time. (Though this would require the record/playback method, I’m ok with that.)

I was hoping for something like this where I pass my own deltaTime:

If (collision (particleA, particleB, deltaTime)) //checks particle's position, orientation and velocity (linear, spin) vectors
{
    bounce(out particleA, out Particle B, deltaTime);// changes particles' position, orientation and velocity (linear, spin) vectors
}

Ok, so what you’re really wanting to do more or less is turn collisions on and off per particle, while Unity only does the whole system one way or the other?

That’s a tough one, I’m not sure what you could do. I didn’t bother with collisions in my particle system for VRC because it didn’t need it, that would be adding probably an order of magnitude or more of work per particle, so I considered it out of the question when it was requested at one point. Your case must need it though for whatever reason. Ok.

There’s probably a slim chance of this being correct, but you might check to see if particle collisions are only processed between subsequent FixedUpdate() steps or if it’s every Update() or what. If the physics engine is doing the collisions and is running at the FixedUpdate() rate, perhaps collisions are only being processed at that rate and you might be able to take advantage of that somehow. I don’t know, just thinking very broadly. The general idea is that you update only the positions from velocities in Update() or LateUpdate(), and update the rest (forces, accelerations, velocites, and let Unity do the collisions as usual) in FixedUpdate(). So the particles can move over subsequent graphics frames with no collisions, then every FixedUpdate() it checks collisions, so it’s not really doing collisions every frame then.

That may be nonsense though, it’s possible that Unity is doing all of it every graphics frame anyway, although I’d be surprised if collisions were done every graphics frame. I don’t know how it works here.

I’m just thinking very broadly and have never tried something like that, and I don’t know if this next one would work. Just trying to spur some ideas:

Perhaps it might be possible to do this as two particle systems in place of one, one with collisions turned on and the other without them, then swap particle states back and forth between the two as necessary. I.e., particle # 58 is a collision particle in this time step or frame, and on the next time step or frame it becomes particle # 8965 in the no-collision particle system by copying its state there. Although that may require reshuffling both arrays every cycle which could get messy and maybe be even slower than just leaving the collisions on all the time in one system.

On the other hand, if you’re already iterating over all of them one at a time to process parts of it yourself, maybe a quick memory copy here and there might not hurt you too much. Managing those particle arrays might be a challenge, but if you find periods of time where only 10% of the particles actually need collision processing done on them, and you’re able to decide that yourself on a per particle basis once per frame (or better, once every few frames in some kind of grouping method), perhaps it might be worth it. I don’t know that there are huge gains to be made there, but perhaps it’s food for thought anyway.

On your original post: Delta time between FixedUpdate() is always the same by definition as far as I know. A physics engine time step, once set, never changes. The only variable time step should be the time between graphics frames since the graphics frame rate is allowed to change.

Using a separate thread: Looking at one of my particle systems where I control each particle individually instead of having Unity do it, I’d think multithreading it wouldn’t work because it’s got a SetParticles() and GetParticles() call in them. In order to have those functions, my particle class has to inherit from MonoBehavior, and you can’t have a MonoBehavior running in a separate thread as far as I know.

There might be a way somehow though, perhaps updating a particle array in the separate thread that updates some global type of flag but doesn’t call any MonoBehavior functions directly itself, then doing Get/SetParticles in the separate thread routine on that array. By doing a double buffer type of arrangement you might be safe for threading by guaranteeing that the Unity thread is only working on a copy that the separate particle thread is not touching at that moment.

I at one point was doing really complicated procedural sound to have a real time gas dynamic simulation of an engine running to produce engine audio. A compute shader filled sound buffers that were read back to Unity and then copied over in some Unity audio function that runs in a separate thread. I.e., in that thread I couldn’t do any MonoBehavior functions, so instead that thread would set a flag that the main Unity thread would read to determine if it was time to load in a new sound buffer. So something like that might be feasible with particles if you really want to try and use the whole CPU and dedicate a core or two to particle processing.

Sounds like a lot of work, I’m not sure how deep into the rabbit hole you really want to go there. Like you said, it’d be easier to just turn collisions on and off per particle. I’d probably think about the first example in this post first, where you have two particle systems. That might be really tricky though with Unity creating new particles every frame. I found in the last couple of Unity 5 betas, it appears to create and render the freshest particles before you get a chance to modify them. Beta 14 worked fine but 16 and 18 seem to skip a frame in there and you lose some control.

1 Like

You may know this already, but just in case: If you’re looking for more speed, you might try eliminating all of the Vector3 method calls like dot/cross, including mathematics like addition/subtraction/etc… I.e., instead of doing this:

Vector3 firstTerm = whatever;
Vector3 secondTerm = whatever;
Vector3 answer;
answer = firstTerm + secondTerm;

You do this:

Vector3 firstTerm = whatever;
Vector3 secondTerm = whatever;
Vector3 answer;
answer.x = firstTerm.x + secondTerm.x;
answer.y = firstTerm.y + secondTerm.y;
answer.z = firstTerm.z + secondTerm.z;

Same goes for .Dot and .Magnitude and so on, I prototype stuff using the Vector3 functions and when it’s working go back and expand it all out so there aren’t any method calls. I’m working on a speedboat simulator that does hydrodynamics/aerodynamics/buoyancy/etc. on a per triangle basis, huge physics processing, but was able to get at least 10 times more speed by eliminating the Vector3 methods and hand coding it all myself. One part of it ran 80 times faster this way, but typically it’s more like 5 to 20 times speed difference. Overall it’s 10 times faster though just by paying close attention to this type of stuff. Setting one Vector3 equal to another is just as fast though, so that’s about the only place I don’t manually code it all.

In other words, if you can’t gain anything by switching it to two particle systems (one with collisions turned on and the other with them turned off), or spending 10 years trying to multithread it all, there may be other areas where you can gain massively if you’re doing your own processing on each particle, and by the sounds of it with so many particles every little bit of CPU counts big time in your application. My boat sim wouldn’t be possible without hand coding the stuff. The Vector3 method calls just take too much time.

1 Like

Perhaps it might be possible to do this as two particle systems in place of one, one with collisions turned on and the other without them, then swap particle states back and forth between the two as necessary

Woah! Blow my mind, why dontcha?
I’m thinking a slight variation on the method you proposed: rather than swapping the actual particles, I’ll keep them all in place by adjusting their velocity vectors to ZERO, while I do all my computations, then I’ll fill back in their newly computed velocity vectors, let it run for a single FixedUpdate cycle, allowing them to move and do whatever collisions. Then I can get their new-post-collision velocities for expensive computations, and again, immediately set them to zero until I’m done with my computation.
As long as no external forces change thier velocity and position, they should remain fixed in place.

Thanks a million Todd for your extensive response. Thanks a billion for the idea!

wow, that’s pretty awful about the Vector3 computations! I blame the compiler, it should be doing that kind of stuff for us, if it’s faster.
Or perhaps it’s due to the fact that all of those operations that return a Vector3, must first create a new Vecor3(), to be returned? In which case, a function that takes an already created Vector3 for the result, as an OUT parameter, would be faster. eg. Vector3.Add(Vector3 v1,Vector3 v2,out Vector3 resultVector)
If so (would need to test), creating a single function like this could save you a lot of manual expansion. (though it would still introduce function call overhead)

Glad to be of help. This is fun stuff to think about. :slight_smile:

Setting velocities to 0: That’s a very clever idea as long as that really tricks Unity into skipping collision detection per particle. I have my doubts though, do you already know if it does? Perhaps it might at least make Unity skip at least part of the processing? If internally they have “if(velocity==0)” or something similar or not per particle, I don’t know, but I’d be surprised if it did given that they have a “per particle system” setting for collisions being on or off. If you try it, I’d be interested in reading about what you find either way. You’ve got me curious.

If I were writing a particle system from scratch that did collisions, I’d try as hard as I could to avoid “if” statements that operate on a per particle basis, so setting velocities to 0 wouldn’t cause particles to skip collisions entirely. I.e., I’d probably do it as two particle systems so thousands of particles can be ommited with a single “if” statement instead of doing one for every particle. That’s probably why in Unity you can’t turn collisions on and off individually per particle. Either they all get collisions or none of them do. The “if” statements are expensive.

Anyway, keep me posted. I’m interested in how your work on that progresses. :slight_smile:

Vector3: I don’t blame the compiler or Unity. Doing a method call is slower than not doing one regardless of the engine, language, or compiler. Regardless of the case, the CPU has to do more work because there is a function call, so it’s best to skip it where you can. Adding a couple numbers together in variables directly in code is always going to be faster than calling some function that does the same thing. It’d still be slower whether or not you did that as an “out” or there was a “new” in there. A function call takes time regardless.

I went to fairly absurd levels in the boat simulator, but it was worth it. I think in terms of groups of identical computations that need to be done and then do them all in a single function call, preferably on memory that is packed together in a single array. I have several functions that compute different types of forces (hydrodynamic, aerodynamic, skin friction, buoyancy, and whatever), but they do it on the entire batch of triangles at one time.

First I might compute all the buoyancy forces on all triangles, then all the hydrodynamic forces, etc… Each type of force is a relatively small function so as it’s cycling through those triangles in a simple loop, the CPU cache can be taken advantage of. Often times that alone can significantly speed up your code. So instead of calling something like ComputeHydroForce() on a single triangle, then calling ComputeBuoyancyForce() on that triangle, etc., I do ComputeHydroForces() (note the “s” at the end) one time which computes the forces for all of the triangles, then do ComputeHydroForces() (again with an “s” at the end) one time which computes that force for all the triangles, then the next force, etc… The CPU cache likes that because it’s not jumping in and out of different functions as it visits each triangle, and I do one function call instead of thousands.

Internally each of those functions returns an array of positions and forces and tacks on the number of forces. The force/position arrays are static size and set to the maximum possible number of forces so there is not a “new” every frame.

(There is a maximum of one force of each kind per triangle in the mesh, and each triangle can be split into 3 triangles at the waterline at run time, so I just set this to the number of triangles in the mesh * 3. Easy. I keep track of the current number of triangles somewhere too so I’m not processing triangles that don’t currently exist.)

Internally all the computations are expanded so there aren’t any function calls at all for any of the triangles which is super fast. At the end of that I have an array with position/force vectors for the hydrodynamic forces, another array for the aerodynamic forces, etc…

Normally at this point you’d zip through them all and call rigidBody.AddForceAtPosition on every force, but again you have a seperate function call there for every triangle which is slow regardless of the engine or language you’re using, or anything else. I have thousands of forces acting on each boat, so it’s important to not have thousands of function calls too.

So instead of doing that, I compute the torques from the forces and force application positions myself, accumulate them all as I go, then do a single AddForce and AddTorque at the end. So if I have 10,000 forces I still only have 1 AddForce and 1 AddTorque call instead of 10,000 AddForceAtPosition calls.

Here’s the function. This is called once per force type (hydro, aero, etc.) per rigid body per physics cycle:

    void AddForcesToRigidBody(ForceAndPosition force, Rigidbody inRigidBody)
    {
        Vector3 totalTorque;
        Vector3 totalForce;

        Vector3 positionRelativeToCenterOfMass;
        //Vector3 positionCenterOfMass = inRigidBody.worldCenterOfMass; //This is now computed in FixedUpdate().
        totalTorque.x = 0;
        totalTorque.y = 0;
        totalTorque.z = 0;

        totalForce.x = 0;
        totalForce.y = 0;
        totalForce.z = 0;

        for (int i = 0; i < force.numberOfForces; i++)
        {
            positionRelativeToCenterOfMass.x = force.position[i].x - positionCenterOfMass.x;
            positionRelativeToCenterOfMass.y = force.position[i].y - positionCenterOfMass.y;
            positionRelativeToCenterOfMass.z = force.position[i].z - positionCenterOfMass.z;

            totalTorque.x += positionRelativeToCenterOfMass.y * force.force[i].z -
                             positionRelativeToCenterOfMass.z * force.force[i].y;
            totalTorque.y += positionRelativeToCenterOfMass.z * force.force[i].x -
                             positionRelativeToCenterOfMass.x * force.force[i].z;
            totalTorque.z += positionRelativeToCenterOfMass.x * force.force[i].y -
                             positionRelativeToCenterOfMass.y * force.force[i].x;

            totalForce.x += force.force[i].x;
            totalForce.y += force.force[i].y;
            totalForce.z += force.force[i].z;
        }

        inRigidBody.AddForce(totalForce);
        inRigidBody.AddTorque(totalTorque);

        //Above code is 0.02 versus 0.24 ms on deep profile.  Better than 10 times faster.
        //for (int i = 0; i < numberOfForces; i++)
        //    inRigidBody.AddForceAtPosition(force[i].force, force[i].position);
    }

Internally there I’m doing all the same calculations that AddForceAtPosition probably does, but I get rid of thousands of function calls per frame. It’s not that Unity is slow, it’s just the overhead of the function call itself. And perhaps most importantly, there are not any function calls at all, whether they’re Unity functions or my own functions, inside of that loop that get called once per triangle. There are no “if” statements either. Just math, and since this is all done on arrays of forces and positions next to each other in memory, it’s likely that the data is all sitting in the CPU cache instead of being retrieved from the very slow RAM every time the loop iterates. That alone can speed up the code significantly. Even the transforms are cached in the beginning before the loop.

I also get rid of TransformPoint and TransformDirection in a similar way by computing them all manually and stuffing the results into an array. For example, I have to rotate/translate all my physics triangles in the boats from local to world space every physics frame. Instead of calling the Unity functions for every single triangle one at a time, I do it manually all in one shot in a single function call. Since it’s sent in as an array of positions, it’s all packed together in memory too which is good for CPU caching. This runs far faster than calling TransformPoint on every single thing you want to transform one at a time.

If the code below replaced “right” with “transform.right”, it would be monstrously slower, because transform.right is accessing a property. I.e., it’s basically a function call rather than a number just sitting in memory. Same goes for transform.up and transform.forward of course.

void TransformPhysicsTrianglesFromLocalToWorldPosition(HydroTriangle triangle, Vector3[] fromPoint, Vector3[] toPoint)
    {
                Vector3 rotatedPositionLocal;

                Vector3 right = transformRight;// cached value of transform.right;
                Vector3 up = transformUp;// cached value of transform.up;
                Vector3 forward = transformForward;// cached value of transform.forward;

                Vector3 position = transformPositionFixedUpdate;// cached value of transform.position;

                for (int i = 0; i < triangle.numberOfTriangles; i++)
                {
                    rotatedPositionLocal.x = fromPoint[i].x * right.x +
                                             fromPoint[i].y * up.x +
                                             fromPoint[i].z * forward.x;

                    rotatedPositionLocal.y = fromPoint[i].x * right.y +
                                             fromPoint[i].y * up.y +
                                             fromPoint[i].z * forward.y;

                    rotatedPositionLocal.z = fromPoint[i].x * right.z +
                                             fromPoint[i].y * up.z +
                                             fromPoint[i].z * forward.z;

                    toPoint[i].x = rotatedPositionLocal.x + position.x;
                    toPoint[i].y = rotatedPositionLocal.y + position.y;
                    toPoint[i].z = rotatedPositionLocal.z + position.z;
                }
    }

In this way I can just hand it an array of whatever type of points I want transformed and it will stick them in some other array, or even back into the original array since the write to “toPoint” is done at the very end. I do everything this way now and have seen rather astonishing speed increases. The key is to think of doing one function call per frame in a big loop instead of thousands of function calls. Again, it’s just because of the overhead of the function call itself probably. Even if the function does nothing at all it still takes time to call it and return nothing. The CPU and RAM are doing things like setting up and popping the stack and so forth every time a function is called, whether it’s a Vector3 or RigidBody or Transform function or one of my own functions. It’s not a Unity specific thing.

If you want to post some of your particle code here or send some of it to me privately, I’d be happy to take a look and suggest ways in which it could be sped up. I’ll keep it confidential. I’m not in the business of writing particle systems and don’t ever plan to be. There may be huge gains that can be made, or there might be none. Depends what you’re doing with the particles. :slight_smile:

Just spent about 30 minutes trying find out how to do #inline (a c, c++ pre-complier option), in c#. It looks like some version of .NET and compilers for c# have a variation of this (optimization - Inline functions in C#? - Stack Overflow), but it looks like not unity/mono, or at least this version.
In c,c++, this option tells the compiler to basically copy this function’s body directly into the code where it would normally be called. This eliminates all that stack push & pull stuff, but obviously, makes the .exe larger (not really an issue these days).
I’m shocked and appalled that c# doesn’t have anyway to deal with this, other than manually expanding the functions like you did. Maybe with #define functions…Nope: C# Macro definitions in Preprocessor - Stack Overflow

Regarding optimization for MY project: I’m still quite a ways off from that. Even with optimization, if we add enough particles to the simulation, the computations needed will inevitably scale beyond what a single cpu can do in realtime: so that is what I’ll need to address first.

Yeah, I looked too. Inline functions aren’t supported unfortunately.

If you’re looking at really expanding it into huge computational territory, compute shaders may be the way to go. I was running something like 40 billion computations per second in my car engine compute shader and Unity still ran. It hit the frame rate pretty hard but it did indeed work.

I came from a PowerBasic background that had function macros. Basically the same thing as inline functions, I think. It would be a text replacement which removed the function call completely.

1 Like

inline:
Ya, was starting to think about some kind of pre-complier code-modifier program, that automatically expands function inline, and the output of THAT is what you compile. I would never have the patience to do it manually, except perhaps in a single (small) function.

Update:
I’ve tested out the sim using the method you suggested and I modified, but setting the velocities to zero is definitely messing stuff up: Particle GameObjects are mysteriously disappearing. Extensive logging shows these particles have undergone some collision in the past. So, I may need to check that a given particle has undergone “collisionExit” before I set it’s velocity to zero, or something like that. Other that that issue, it’s working great: UI continues to respond properly no matter how many particles I add to the sim (only been able to test to a limited number of particles so far, as that other issue becomes more prevalent as more particles are introduced)! Still playing with it: I’ll keep you updated.

Update: rather than setting velocities to zero, I changed the game engine physics time rate to zero (no physics time advancement)., did my computations in the other thread, then, when it finished, turned physics time rate back to it’s original value.

Clever! I didn’t realize setting it to 0 would do that. Thanks for the tip :slight_smile: