Will Multithreading be of any help here?

Quite new to Unity and C# I’m confused about a few things that I’m gonna state below. So, I have this function in a script which iterates over a list does some raycasts and physics based stuff. The size of list it iterates over also changes after every few seconds. The problem is that it gets invoked in FixedUpdate(), and sometimes the list has over 100s of items which causes excessive lag and framedrops. This creates visible spikes in the profiler like this: 2590052--181157--Capture.PNG So, using coroutines and threads came to my mind.

But, upon researching a bit, I came to know that Unity API is not thread-safe and I shouldn’t call Unity functions in another thread. Also, Using coroutines won’t make any sense either because they operate on the same core, right? Currently, I’m trying to reduce the amount of calculations and use efficient data structures with little or no avail. So, How should I go about it? Will using coroutines/multi threading be viable here?

Yes, that’s right. The Unity API is not thread-safe. So, you still can do multi-threading but you can’t use anything from Unity in your worker threads, you need to use your own data structures and functions or use an external thread safe library. Thought it’s still possible to do some work in threads and share the results with the main thread, therefore with Unity.

Coroutines are not technically threads. They runs on the main thread. A coroutine function is just spliced in several chunks and one chunk is executed per frame, in serial way not in parallel. Using coroutines, you would need to determine how to distribute your work over several frames.

I don’t know what you’re doing, but I’d be willing to bet that you can optimize the heck out of it to the point where you wouldn’t care about multithreading it. My current project is a boat simulator that iterates over hundreds of polygons to compute thousands of forces every physics cycle. It’s a heavy workload, but I just do it all in the main thread and it’s plenty fast enough, runs 400+ fps on a single monitor, 200 or so on triples and is quick enough for Oculus.

You mentioned lists, one trick is to just use a standard sized array or list that holds the maximum possible number of entries you could ever need, then compute and fill that structure. For instance in my boat sim I split every mesh triangle at the waterline into three sub triangles, so the number of triangles that are getting computations done on them is changing every frame. If I were trying to build up a list every time it would be horribly slow. Instead I just use a simple structure of arrays that are fixed size, then keep track of how many triangles are currently active. Then it’s just a loop through from 0 to however many there are that frame. Lots of code is run on every triangle and it’s not problem.

Depends what you’re doing for each list entry of course, but if you’re computationally heavy you can get some serious speed increases by eliminating Vector3 method calls and hand coding all the math. Here’s an article I wrote awhile back on that:

http://www.performancesimulations.com/wp/how-to-get-big-speed-increases-in-unitys-physics-or-any-math-heavy-code/

Anyway, there are probably a lot of things you can do to make it all run a lot faster.

3 Likes

Thanks! I’ll look into that arcticle. I’ve switched to arrays, and the datastructure has a lot less stuff now, meaning a lot less to calculate per cycle. The actual function is very simple (some if-else stuff, has like 20 lines of code overall), the main problem is unity has to do it 100s of time per second. It seems that I can’t really use multi threading because the function uses Unity classes. Is there something else I can do ? Do you think doing the calculations on the GPU will help?

Another technique is just to wait some time before iterating the list, or just do it something like every 1 / 10 frames or something, so it doesnt run every frame.

float lastRaycastsTime;

void Update()
{
   //Only do the raycasts every 1/10 of a second
    if (Time.time - lastRaycastsTime > 0.1f)
     {
         lastRaycastsTime = Time.time;

        //Do the raycasts now

        return; //Optional: exit the Update function
      }
}
1 Like

Here are my rough questions to ask when optimising.

  • Do I need to do the calcs at all? Can they be faked sufficiently? Does it really matter to my game?
  • Do I need to do the calcs so many times? Can I cull off screen stuff?
  • Do I need to do the calcs so often? Is once every 10 frames enough? Once every 10 seconds? Only 10% per frame?
  • Is there a more efficient algorithm? Does a custom structure which just solves my issue perform better then the built in Unity one?
  • Can it be micro optimised? Can frequently called methods be inlined?

Most of the time I never get to the end of the list. It’s important to stop as soon as desired performance is reached.

1 Like

Wow, your trick actually works, I was getting ~90 now I’m getting around 140 fps. Thanks!

Just to give some idea of scale, here’s my main FixedUpdate() function in the boat sim. The function calls throughout all this are in many cases much longer than all of this code, and many of those functions iterate over a few hundred triangles every physics cycle. That means I’m iterating over all of the triangles not just once, but multiple times per physics cycle doing different computations on them, and I’m doing all this 100 times per second which is at least double what I see other people around here typically running. I can’t do 20 boats at a time with this (arcade games need not apply for this method), but can get away with doing a few boats at least. As you can see there are a lot of “if” statements here, and there are a few more in some of those function calls that aren’t shown here. Many of those “if” statements are processed on every triangle. So while your FixedUpdate() might have a few “ifs,” mine is probably in the hundreds if not thousands. Probably something else you’re doing in there is slowing you down, a few “ifs” every frame is nothing.

This class is a little less than 2000 lines and probably half or more of that is run every physics cycle. So you really can do a monumental amount of work in FixedUpdate() if you’re careful and pay close attention to optimizing things. In early development when this was all much smaller I was using 50%+ of the CPU just for physics on one boat. By paying close attention to data structuring, not calling functions (especially math functions) when they can be hand coded, I got that down to probably 5% CPU for one boat.

Someone said something about trying to do the work every few frames instead of every frame. That’s a good idea. Another one that’s similar is to try to spread it out. For example: I wrote the physics engine for VRC Pro (an R/C car racing sim) which included a particle system for engine smoke and dust blowing around and so on. The physics there ran 250Hz or 500Hz depending on the vehicle (AFAIK most Unity games run only 50Hz), and the smoke/dust was done every physics frame. So to make the particle system run stupidly fast I split the particle array into groups and only moved one of the groups every frame. Think I got something like an 800% speed improvement that way which meant I could run a lot more particles.

The idea there was to do the same amount of particle system work every frame (move xx particles), but only one group of them at a time. I.e., on physics frame 1 I’d move particles 0-500, on frame 2 I’d move particles 500-1000, and so on. (It wasn’t really 500, I don’t remember how many it ended up being. I just put a couple numbers in there to tune it by hand until it looked good enough and performed well).

None of this was matched up to the graphics frame rate at all, it was all equivalent to doing it in FixedUpdate(), but you couldn’t tell by looking at it. So there was not really much (if any) CPU spiking from the particle system because only part of the work was done every physics cycle and graphics frame.

So rather than skipping the work for several frames and then doing it all at once, it’s better imo to try to spread it out over a few frames if you can. Sometimes you can do that, other times you can’t. You’re the only one that could know what will work for you though, of course.

So yeah, I suspect that if you’re seeing spikes that are so game-stuttering, you’re probably calling some enormously slow functions in there somewhere. You should be able to do many thousands of lines of code in FixedUpdate() without seeing even a hiccup.

    void FixedUpdate() {
        int numLoops;
        float tStep;

        if (isPropeller)
        {
            numLoops = numLoopsIsPropeller;// 10;
            forcesFluidDynamicIsPropeller.numberOfForces = 0;
        }
        else
        {
            numLoops = 1;
        }

        tStep = Time.fixedDeltaTime / numLoops;
        propTorque = 0; //Reset

        for (int iLoop = 0; iLoop < numLoops; iLoop++)
        {
            if (skipFixedUpdate || skipPhysics)
            {
                return;
            }

            if (isPropeller)
            {

                propAngleChange = propRadPerSec * tStep * Mathf.Rad2Deg;
                transform.Rotate(0, 0, propAngleChange);
            }

            transformRight = transform.right;
            transformUp = transform.up;
            transformForward = transform.forward;
            transformPositionFixedUpdate = transform.position;

            positionCenterOfMass = useRigidBody.worldCenterOfMass;

            TransformPhysicsTrianglesFromLocalToWorldPosition(physicsTrianglesLocal, physicsTrianglesLocal.position, physicsTrianglesWorld.position);
            TransformPhysicsTrianglesFromLocalToWorldPosition(physicsTrianglesLocal, physicsTrianglesLocal.vert0, physicsTrianglesWorld.vert0);
            TransformPhysicsTrianglesFromLocalToWorldPosition(physicsTrianglesLocal, physicsTrianglesLocal.vert1, physicsTrianglesWorld.vert1);
            TransformPhysicsTrianglesFromLocalToWorldPosition(physicsTrianglesLocal, physicsTrianglesLocal.vert2, physicsTrianglesWorld.vert2);

            TransformPhysicsTrianglesFromLocalToWorldDirection(physicsTrianglesLocal, physicsTrianglesLocal.faceNormal, physicsTrianglesWorld.faceNormal);
            TransformPhysicsTrianglesFromLocalToWorldDirection(physicsTrianglesLocal, physicsTrianglesLocal.vNormalized0, physicsTrianglesWorld.vNormalized0);
            TransformPhysicsTrianglesFromLocalToWorldDirection(physicsTrianglesLocal, physicsTrianglesLocal.vNormalized1, physicsTrianglesWorld.vNormalized1);
            TransformPhysicsTrianglesFromLocalToWorldDirection(physicsTrianglesLocal, physicsTrianglesLocal.vNormalized2, physicsTrianglesWorld.vNormalized2);

            //restore face areas in physicsTrianglesWorld.  This only needs to be done if triangle splitting is enabled because it will overwrite the face areas.

            if (splitTrianglesAtWaterline)
            {
                for (int i = 0; i < physicsTrianglesWorld.numberOfTriangles; i++)
                {
                    physicsTrianglesWorld.faceArea[i] = physicsTrianglesLocal.faceArea[i];
                }
                interceptPointPairs.numberOfPointPairs = 0; //Reset the counter.  The SplitTriangles function will increase the number internally as intercepts are found.
                SplitTriangles(physicsTrianglesWorld, interceptPointPairs);
            }

            ComputeWorldPointVelocities(physicsTrianglesWorld, useRigidBody, physicsTrianglesWorld.position, physicsTrianglesWorld.faceVelocity);

            if (isPropeller)
            {
                AddPropVelocitiesFromWorldPositionsSpinAxisZ(physicsTrianglesWorld.numberOfTriangles, propRadPerSec, transform.forward, physicsTrianglesWorld.position, physicsTrianglesWorld.faceVelocity);
            }

            DetermineIfTriangleIsAboveOrBelowWaterline(physicsTrianglesWorld, indicesWaterlineAbove, indicesWaterlineBelow);

            if (buoyancy)
            {
                forcesBuoyancy.numberOfForces = 0; //Reset number of forces to 0 so we can call the Compute functions more than once without having the forces array overwrite itself.
                ComputeBuoyancyForces(forcesBuoyancy, physicsTrianglesWorld, waterPlaneNormal, waterPlanePositionWorld, indicesWaterlineBelow);
                AddForcesToRigidBody(forcesBuoyancy, useRigidBody,numLoops);
            }

            if (fluidDynamics)
            {
                forcesFluidDynamic.numberOfForces = 0; //Reset number of forces to 0 so we can call the Compute functions more than once without having the forces array overwrite itself.
                ComputeFluidDynamicForces(forcesFluidDynamic, physicsTrianglesWorld, waterPlaneNormal, waterPlanePositionWorld, indicesWaterlineAbove, airDensity);
                totalHydrodynamicForce.x = 0;
                totalHydrodynamicForce.y = 0;
                totalHydrodynamicForce.z = 0;
                ComputeFluidDynamicForces(forcesFluidDynamic, physicsTrianglesWorld, waterPlaneNormal, waterPlanePositionWorld, indicesWaterlineBelow, waterDensity);
                AddForcesToRigidBody(forcesFluidDynamic, useRigidBody, numLoops);

                if(isPropeller)
                    AppendForcesIsPropeller(forcesFluidDynamic, forcesFluidDynamicIsPropeller);
            }

            ComputeWaterlineFrontPoint(interceptPointPairs); //This needs to be done before skin friction because the Reynold's number used there depends on the length to the front waterline intersection.

            if (skinFriction)
            {
                forcesSkinFriction.numberOfForces = 0; //Reset number of forces to 0 so we can call the Compute functions more than once without having the forces array overwrite itself.
                ComputeSkinFrictionForces(forcesSkinFriction, physicsTrianglesWorld, waterPlaneNormal, waterPlanePositionWorld, indicesWaterlineBelow, waterDensity);
                AddForcesToRigidBody(forcesSkinFriction, useRigidBody, numLoops);
            }


            if (isPropeller)
            {
                propTorque -= ComputePropellerTorqueFromWorldCoords(forcesFluidDynamic);
            }
        }
        propTorque = propTorque / numLoops;
      
    }
2 Likes

If you want to post the code here I’ll be happy to look. I imagine the others here will as well.

GPU: I do a fair bit of compute shader work so have some experience with this. The answer is a big “maybe, but probably not.” If you have to read anything back from GPU memory to the CPU side then no, unfortunately this is still too slow on GPU to be very helpful unless your math is ridiculously heavy (even in my case the GPU would end up slowing things down just because of the data readback that would be required every frame, it’s a real bummer and limits me quite a bit).

It can take 10-15ms to do the first computeBuffer.GetData() call every frame even if you’re just pulling a single float out of the GPU which immediately limits you to a lower frame rate than you’re getting now. You can do stupendous amounts of work on the GPU in virtually zero time, but the readback to the CPU can kill you. So in your case you’re probably better off not wasting time with the GPU.

The GPU is also a lousy place to do logic, so by your description of your code with the “if” statements, it’s probably not a good GPU candidate anyway even if the data readback was fast. CPU rocks at logic, very fast, but the GPU is slow because the processors can only process one branch at a time, so a single “if” can cut the speed in half and you start to lose the benefit of the high level of parallelization on it. A couple years ago I wrote a really big compute shader with maybe three to five “if” statements in it. By doing some math tricks to get rid of those “ifs”, the whole thing sped up around 500%. It sounds like you probably have a lot of divergence, so it’ll probably be slow. The readback to the CPU side would kill you anyway if you have to do even one for any amount of data at all.

Aside: Going back to the earlier example of just using a fixed size array. If you look in the code I just posted you’ll see a few variables like this:

forcesSkinFriction.numberOfForces=0;
forcesBuoyancy.numberOfForces=0;

There are a few of those sprinkled in there. Those are counters used to say “this is how many of this type of force there is” which is used elsewhere to cycle from 0 to that number of forces. This counter just gets incremented and whatever is needed is just written straight into the array at that index when its time to compute that index. There is no List.Clear() or List.Add() or array equivalent. If I had 500 SkinFriction forces last frame but only 400 this frame, the 400 new forces are written from 0 to 400 (ok, really it’s 399 but whatever) and the original 400-500 entries just stay in the array. There’s no need to clear them. Because forcesSkinFriction.numberOfForces = 400 it won’t add those old extra 100 forces on the next frame. It just processes 0-400 and adds those to the boat and ignores 400-500 or whatever other old stuff might be up there somewhere at higher indexes.

There are multiple passes over those triangles every frame too. For example, ComputeFluidDynamicForces() iterates over all of the triangles (including the “new” ones that are created each frame during the waterline splitting process). Then in the next block the ComputeSkinFrictionForces() iterates over all of the triangles again, etc… Each of those functions is bigger than your entire FixedUpdate() right now by the sounds of it, and they run a loop of hundreds of triangles, and there are a handful of “ifs” at each one of those. The key is to write those to be very efficient, operating on data packed as tightly together in memory as it can be so the CPU will cache a lot of it, and the functions themselves are small. It’s a “data oriented design” approach which can often yield pretty huge speed increases.

Anyway, it’s just stuff like that… If you’re careful, you can make things run really fast and do a lot more in FixedUpdate() than you probably think you can. :stuck_out_tongue:

https://www.youtube.com/watch?v=AXAL2a3wuww

2 Likes

Todd, you’re right the whole time lol. I rewrote the whole thing, now the work is distributed in 3 arrays depending on how far away the objects are from the player. I process the data for objects near to player first and in small intervals. I guess this does take a bit more memory, but its a lot more faster. That article you linked was quite useful as well.

I ran my game without this script and noticed my other scripts were taking a lot of resource as well. It turns out that one of my scripts which handled lighting had a huge bug that I overlooked. I’m still optimizing my other scripts. I was getting as low as 30 fps, now I’m getting constant 1100 fps on a i5 processor. I still have to check performance on mobile devices though. It amazes me that how much you can optimize ~20 lines of code. Thanks again.

1 Like

30 to 1100, now you’re talking. :smile: