[C#] How to use a Multi Threaded Job Queue for Math function

I currently make a 2D (top down)game (in c#).
I want to have a endless terrain, for that I use a Chunksystem that genereates the terrain with a SimplexNoise function.
My Problem is that if the game generates new terrain, it lags for a short time.
To fix that I want to use multithreading. Because I have to calculate the terrain very often in a very short time, I want to use a job queue in a separate thread.

Does anyone have an idea how this can be achieved?
Is there a lib or scripts I can use for this?

While splitting work across multiple frames using a coroutine is a great optimisation technique, it is actually possible to make full use of Mono’s multithreading API within your Unity scripts. To take advantage of this however, you need to make sure that you don’t use the non-thread-safe parts of Unity’s API in your own worker threads.

The general pattern that I’d recommend to do this is to divide the work you need doing into chunks (your terrain example is an ideal example of this), and separate out as much of the code which doesn’t need to touch Unity’s API as you can. You then execute these chunks of work using the ThreadPool API. The key is to get your threads to output the results of their work into a shared container, then read and use the results on your main thread after all the hard work has been done in parallel, which you can then safely use with Unity’s API.

So in your case, you’d probably want to generate all the vertex, triangle and normal arrays in your threads - because as long as you’re just doing math and filling items in the array, this is thread safe.

Once all the vert/tri/normal computation is done, you’d fall back to your main thread and assign these arrays to your mesh (because the Mesh class and its functions & properties are part of the unity API and therefore most likely not thread safe).

I’ve tried to create as generic an example as I can, so you and others can use it as a template for any multithreaded work that can be broken into chunks. In this example, I simply populate a large 2d array of integers using a function (which is contrived to be computationally expensive just to show the performance difference).

In the start function, I first do all the work on a single thread to show how long that takes, followed by doing the work on multiple threads. Both versions are timed using the Stopwatch class and the results printed in the game view using OnGUI.

I’ve tried to make the code here clear enough that you should be able to follow the logic and see how it works, but if you have any questions, ask them in the comments.

To try the script, paste the entire code into a single Unity c# script named “ThreadingExample” (including the two struct definitions at the bottom).

Place on a gameobject in your scene, and hit play. It will lock the editor up for a few seconds as it performs the single-threaded example, followed by the multithreaded example. You should end up with results on screen something like the below screenshot. I have the cpu graph included on-screen to show the threading working - the “long hump” on CPU 0 is the single threaded work, followed by the short hump on all CPUs which is the same work done on multiple threads.


using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

public class ThreadingExample : MonoBehaviour {

	int[,] results;			// this is the 2d array which will contain the result
	int gridSize = 512;
	// thread related variables
	WorkItem[] workItems;
	ManualResetEvent[] doneEvents;	// a series of flags, each indicating if a given chunk has finished
	WorkChunk[] workChunks;			// the container for a chunk of work items
	int numChunks = 16;

	string output;
	IEnumerator Start()
		Output("Setting up work chunks");
		yield return null;

		Stopwatch stopWatch = new Stopwatch();

		Output("Starting Main thread method");
		results = new int[gridSize,gridSize];
		yield return null;
		Output ("Main thread method: "+(stopWatch.ElapsedMilliseconds)+"ms");
		yield return null;

		Output("Starting Multi threaded method");
		results = new int[gridSize,gridSize];
		yield return null;
		Output ("Multi-threaded method: "+(stopWatch.ElapsedMilliseconds)+"ms");
		yield return null;

	void Output(string s)
		output += s+"


	Vector2 scrollPosition;
	void OnGUI()
		scrollPosition = GUILayout.BeginScrollView(scrollPosition);

	void SetUpWorkChunks ()
		// make list of all work items needing to be calculated
		workItems = new WorkItem[gridSize*gridSize];
		int i=0;
		for (int x=0; x<gridSize; ++x)
			for (int y=0; y<gridSize; ++y)
				workItems *= new WorkItem(x,y);*
  •  		i++;*
  •  	}*
  •  }*
  •  // share out work items between chunks (equal to number of threads allowed)*
  •  workChunks = new WorkChunk[numChunks];*
  •  doneEvents = new ManualResetEvent[numChunks];*
  •  int numItemsPerChunk = workItems.Length / numChunks;*
  •  for (int n = 0; n<workChunks.Length; ++n) {*
  •  	// work out which items this chunk should calculate*

_ int start = n * numItemsPerChunk;_
_ int end = n * numItemsPerChunk + (numItemsPerChunk - 1);_

  •  	if (n == workChunks.Length - 1) {*
  •  		end = workItems.Length - 1;	*
  •  	}*
  •  	// copy portion of work items for this chunk*
  •  	WorkItem[] chunkWorkItems = new WorkItem[(end - start) + 1];*
  •  	System.Array.Copy (workItems, start, chunkWorkItems, 0, (end - start) + 1);*
  •  	// instantiate work chunk, passing the items*
  •  	workChunks[n] = new WorkChunk( chunkWorkItems, n, this );*
  •  	// we need a reference to each chunk's "doneEvent"*
  •  	doneEvents[n] = workChunks[n].doneEvent;*
  •  }*
  •  Output ("finished setting up work chunks");*
  • }*

  • void DoNormalWork ()*

  • {*

  •  // this would be the non-threaded method of doing all work items:*
  •  for (int n=0; n<workItems.Length; ++n)*
  •  {*
  •  	DoWork( workItems[n] );*
  •  }*
  • }*

void DoThreadedWork ()

  • {*

  •  // this loop tells all work chunks to do their work items simultaneously:*
  •  for (int w = 0; w < workChunks.Length; w++) {*
  •  	doneEvents[w].Reset();*
  •  	ThreadPool.QueueUserWorkItem (workChunks[w].ThreadPoolCallback);*
  •  }*
  •  // Wait for all work chunks to complete their work...*
  •  WaitHandle.WaitAll (doneEvents);*
  • }*

  • public void DoWork( WorkItem item )*

  • {*

  •  // this is the work function which will occur in parallel on multiple threads*
  •  // in this example, an abitrary function, made deliberately slow with a loop*
  •  float result = 0;*
  •  for (int n=0; n<10000; ++n)*
  •  {*

_ result += Mathf.Sqrt( item.x + item.y + n * 0.1f );_

  •  }*
  •  // put result into result array*
  •  results[item.x, item.y] = (int)result;*
  • }*


public struct WorkItem

  • // This is the definition of a single item to be calculated.*
  • // In this example, it’s basically just an 2d integer grid reference.*
  • public int x;*
  • public int y;*
  • public WorkItem (int x, int y)*
  • {*
  •  this.x = x;*
  •  this.y = y;*
  • }*

struct WorkChunk

  • // A work chunk contains an array of work items*

  • public WorkItem workItems;*

  • public ManualResetEvent doneEvent; // a flag to signal when the work is complete*

  • public int num;*

  • ThreadingExample workOwner; // a reference to the owner (since the actual DoWork function is there)*

  • public WorkChunk (WorkItem workItems, int num, ThreadingExample workOwner)*

  • {*

  •  this.num = num;*
  •  this.workItems = workItems;*
  •  this.workOwner = workOwner;*
  •  doneEvent = new ManualResetEvent(false);*
  • }*

  • public void ThreadPoolCallback (System.Object o)*

  • {*

  •  doneEvent.Reset();*
  •  // do each work item in this chunk's work item list:*
  •  for (int i=0; i<workItems.Length; ++i) {*

_ workOwner.DoWork( workItems );_
* }*

* doneEvent.Set ();*
* }*


What you can do is use Ienumerator (coroutines) and split up the workload into several frames.


IEnumerator StreamOutWorld () {
    for (int i = 0; i < TerrainChunksToLoad.Count; i++) {
        Instantiate (TerrainChunksToLoad*);*

yield return null; //here it will skip to the next frame for the next “for” iteration
yield return null;
This will only load minichunks one per frame, and you can split up any number of functions this way (but make sure that important stuff does not skip a frame, like the colliders so you won’t fall through the ground if it hasn’t loaded)
You’ll see examples of this in a lot of games where the world “grows out” like Minecraft.