Race conditions testing and lock performances when using System.Threading

I have been working on a project where a number of threads need to access a global array. I made a test script and got some very interesting results. I wanted to share incase it might be beneficial to some.

Briefly, I found out,
0. locks work as intended if they are used on both threads accessing the variable.

  1. Without locks, if two threads work on same array of structs simultaneosly, there is ~1% chance of race conditions occuring. If you can tolerate this, it saves you worrying about locks.
  2. locking each variable just before write/read, costs alot, by almost 20 times of not using locks.
  3. If you lock a variable at the very beginning, the other thread will have to wait and thus it wont be truely simultenous.
  4. Most interestingly, if you lock on only one thread, but only before writing/reading, it saves a bunch of race conditions from happenning (dropping to only ~0.02%) with costing only about 5 times as much (instead of 20)
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System.Diagnostics;
using System.Threading;
using System.Linq;

public class multithread_testing2 : MonoBehaviour
{
    static bool finished_sideThread = false;
    static bool finished_mainThread = false;
    string timeElapsed_SideThread;
    string timeElapsed_MainThread;

    Vector3[] v3_array = new Vector3[1000];
    Vector3[] v3_reference_array = new Vector3[1000];

    private void Update()
    {
        if (finished_sideThread && finished_mainThread)
        {
            int raceConditions = 0;
            finished_sideThread = false;
            // TODO: Check that arr1.Length and arr2.Length are the same
            for (int i = 0; i < v3_array.Length; i++)
            {
                if (v3_array[i] != v3_reference_array[i])
                {
                    raceConditions += (int)(v3_reference_array[i].x - v3_array[i].x);
                    UnityEngine.Debug.Log("Difference at element " + i.ToString() + ", " + v3_array[i].ToString() + " != " + v3_reference_array[i].ToString());
                }
            }
            print("total number race conditions occured:" + raceConditions.ToString() + "\nTime elapsed on side thread:" + timeElapsed_SideThread + "\nTime elapsed on main thread:" + timeElapsed_MainThread);
        }
    }


    void Start()
    {
        for (int i = 0; i < v3_array.Length; i++) //initialize the variables on main thread.
        {
            v3_array[i] = new Vector3(0f, 0f, 0f);
            v3_reference_array[i] = new Vector3(0f, 0f, 0f);
        }
        new Thread(WorkWork).Start(); //Start side thread. Race conditions occur here.

        #region MainThread WorkWork
            Stopwatch mt_timer = new Stopwatch();
            int p=0;
            mt_timer.Start(); //Start timer before lock
            //lock (v3_array)//comment to exclude lock (1ST LOCK)
            {
                while (p < 100000)
                {
                    p++;
                    //lock (v3_array)//comment to exclude lock (2ND LOCK)
                    {
                        for (int i = 0; i < v3_array.Length; i++)
                        {
                            //lock (v3_array)//comment to exclude lock (3ND LOCK)
                            v3_array[i].x += 1f;

                            v3_reference_array[i].x += 2f;
                        }
                    }
                }
            }
            mt_timer.Stop();
            timeElapsed_MainThread = mt_timer.Elapsed.ToString();
            finished_mainThread = true;
        #endregion

        //new Thread(WorkWork).Start(); //Start thread //No race condition occurs if after main
    }



    void WorkWork() //continuesly changes the static variable
    {
        Stopwatch timer = new Stopwatch();
        int p = 0;
        timer.Start(); //Start timer before lock
        //lock (v3_array)//comment to exclude lock (4RD LOCK)
        {

            if (v3_reference_array[0].x == 200000) print("MainThread finished before SideThread started!");
            while (p < 100000)
            {
                //lock (v3_array)//comment to exclude lock (5TH LOCK)
                {
                    p++;
                    for (int i = 0; i < v3_array.Length; i++)
                    {
                        lock (v3_array)//comment to exclude lock (6TH LOCK)
                            v3_array[i].x += 1f;
                    }
                }
            }
        }
        timer.Stop();
        timeElapsed_SideThread = timer.Elapsed.ToString();
        finished_sideThread = true;
    }
}

RESULTS:
//without any lock, using non-static variables
// total number race conditions occured: 969047 in 100000000 =~1%
//Time elapsed on side thread: 00:00:00.3689510
//Time elapsed on main thread: 00:00:00.4870519
// total number race conditions occured: 792564
//Time elapsed on side thread: 00:00:00.3165800
//Time elapsed on main thread: 00:00:00.4532916

//without locks using static variables:
// total number race conditions occured: 978096 in 100000000 =~1%
//Time elapsed on side thread: 00:00:00.3232456
//Time elapsed on main thread: 00:00:00.4566593
// total number race conditions occured: 961748
//Time elapsed on side thread: 00:00:00.3216992
//Time elapsed on main thread: 00:00:00.4596025

//with locks 3,6 using non-static variables:
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:13.4967897
//Time elapsed on main thread: 00:00:13.7488787
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:14.9234585
//Time elapsed on main thread: 00:00:15.2244580

//with lock 1 and 4 using non-static variables: “MainThread finished before SideThread started!”
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.6029872
//Time elapsed on main thread: 00:00:00.3307784
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.5803853
//Time elapsed on main thread: 00:00:00.3024417

//with locks 3,6 using static variables:
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:16.5628217
//Time elapsed on main thread: 00:00:16.6551825
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:14.0155613
//Time elapsed on main thread: 00:00:14.1519171

//with lock 1 and 4 using static variables: “MainThread finished before SideThread started!”
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.5682462
//Time elapsed on main thread: 00:00:00.3209186
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.5819123
//Time elapsed on main thread: 00:00:00.3230487

//with locks 2,5 using static variables:
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.6816896
//Time elapsed on main thread: 00:00:00.6584224
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.4032680
//Time elapsed on main thread: 00:00:00.6552052

//with locks 2,5 using non-static variables:
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.6975935
//Time elapsed on main thread: 00:00:00.6521430
// total number race conditions occured: 0
//Time elapsed on side thread: 00:00:00.6321385
//Time elapsed on main thread: 00:00:00.3994793

//using only 4TH lock:
// total number race conditions occured: 522735 in 100000000 =~0.5%
//Time elapsed on side thread: 00:00:00.4555784
//Time elapsed on main thread: 00:00:00.5212350
// total number race conditions occured: 519063
//Time elapsed on side thread: 00:00:00.4844576
//Time elapsed on main thread: 00:00:00.5330215

//using only 5TH lock:
// total number race conditions occured: 642547 in 100000000 =~0.5%
//Time elapsed on side thread: 00:00:00.4665403
//Time elapsed on main thread: 00:00:00.5322832
// total number race conditions occured: 396752
//Time elapsed on side thread: 00:00:00.4863987
//Time elapsed on main thread: 00:00:00.5260168

//using only 6TH lock:
// total number race conditions occured: 19731 in 100000000 =~0.02%
//Time elapsed on side thread: 00:00:02.6481066
//Time elapsed on main thread: 00:00:00.5695836
// total number race conditions occured: 18969
//Time elapsed on side thread: 00:00:02.6499795
//Time elapsed on main thread: 00:00:00.5332299

1 Like

Interesting. Thank you for sharing. It reminds me the long chapter in my book to prepare my C# certification. I cannot say that I saw often some race condition failure under Unity. However you did a big work trying all possibilities. You have dug deeper than me ++

1 Like

A small note: using keyword before v3_array did not have any affect.

1 Like

You look to be way ahead of me.

Am reading here, to learn about threads etc:

https://www.jacksondunstan.com/articles/5522

There are so many things wrong here so I try to tackle them one by one.

First of all where and when are race conditions acceptable? Note we don’t talk about pure read races in a producer consumer setup. We talk about two or more threads doing read-modify-write on shared data. Such race conditions can result in completely random and unpredictable data.

The way you “check” or “count” your race conditions is also flawed. If race conditions would be predictable they wouldn’t be such a pain ^^. The differences between your two arrays are not really a good indicator how many race conditions happens. In your case you’re just doing an increment by 1. So the whole read-modify-write is quite fast. Even on a single core CPU you can get a race conditions if the thread is supended right in between the read and the write. On multi-core CPUs it just gets progressively worse since there are different caches on both sides. So it’s possible that one thread actually races ahead the other and the other thread may “undo” several iterations of the first one or vice versa. If you think that you can monitor and count race conditions, just forget about it. On top of that different hardware designs may produce a completely different behaviour. In your tests you haven’t even pinned the threads to specific cores which means the OS could shedule the threads on different cores each time (maybe lookup thread affinity). So depending on the hardware used the results may look completely different. That’s the biggest issue with not handling race conditions properly. It may work on one system perfectly fine but completely breaks in another.

Not every process is suited to be multithreaded. In such cases where you have to do the same dull operation over a large set of values you generally want to apply a divide and conquer approach. So if you spawn 4 threads, each thread would just process a quater of the whole array without any locks since there is no concurrent write access to any of the data since every thread works on its own subset. That’s where the new C# “Span” type would come in handy or Unity’s NativeSlice.

Often times when you have parallel processes it’s better to throw more memory at the problem. So when you have two threads that should modify the same array each with a complex and expensive algorithm, you can simply duplicate the array, have each thread working on their own copy and once everything’s finished you can simply combine the two results on a single thread. Though when you apply divide and conquer here, even the final step can be made faster. Note that I said “can” because it highly depends on the actual number of elements if threading actually brings you any benefit. For small element numbers you probalby burn more time with the whole overhead than saving.

Note that race conditions are affected by so many unpredictable things. Branch prediction, out-of-order execution and of course all other threads in the system. When your thread is suspended by the OS, you don’t know when it will be continued and how long it will run until the next interruption. There are literally several thousands threads running right now in your OS. Your threads are just two of them.

Note that a “lock” does not lock the variable. Locks are assigned to objects, any object for that matter. It just acts as a flag / mediator between two threads that both want to aquire the lock. This has nothing to do with the actual object or the variable. So “locking” on one thread but not the other has no “locking” effect at all. It just slows down the one thread that does the locking in each iteration while you still have the risk of having race conditions.

So all in all it’s not really clear what you’re after here. I always said that understanding multithreading as a concept is easy, however the dificult part is actually designing multithreading code that is actually efficient and safe. So this is more of a design problem than a conceptional problem. Having two threads messing with the same data at the same time is just an absolute no-go.

Having multiple threads read the same data at the same time is never an issue. It becomes an issue when the data is modified. If data is ever modified, only a single thread should be able to modify it at the same time. There are different lock-free designs when you have a single producer (one that writes data) and multiple consumers. Though the exact details depend on the concrete usecase. For example you never ever want to have two threads be able to add or remove items from a normal List. An add or remove operation can either create a new array internally or move all the elements around in a loop. If two threads mess with a list at the same time you can get anything. You can even loose unrelated data in the list because it was currently moved around by one thread.

So unless you really really know what you’re doing, I would highly recommend to forget about all your tests which concentrate way too much on the implementation details of your system and better focus on a robust code design.

3 Likes

You’ve made two steps, I think. It seems you’re claiming the OP:

  1. Is doing the ways of testing threading all wrong
  2. Doesn’t need to be testing because doesn’t need threading

Could you just back up and presume that he does need threading, and recommend the best way to do it safely without any significant performance drops?

1 Like

:smile: exactly

@Bunny83 from some of your other threads, I glean that you’re somewhat of an expert.

Could you recommend a load that creates waves (sine or similar) and then tests threading with the creations of these waves?

Particularly impressed by your complex numbers example. Blew my mind.

Well, I think i did make clear a couple of times that there is no “best way” and that it always depends on the actual problem you try to solve. He only had this strange test case which makes no sense. Concurrent access to a huge amount of data just doesn’t make any sense. I already pointed out that not every process can be meaningfully multithreaded and that almost any attempt may just make it worse. As I also pointed out, usually the best way is to split the work into independent chunks and have each chunk processed by a seperate thread so you don’t need any syncing except at the end.

I would never claim to be an expert (though it depends on the topic). I just gathered knowledge over time but you should never blindly trust anyone, no matter who it is ^^. I’m not sure what you mean by “load” and what exact complex number example you’re talking about.