I thought it would be nice to discuss sorting in the context of the ECS and (more importantly) the Job System. Most of my research into parallel sorting turned up a common desire to ensure the list of items to be sorted was greater than a specific threshold, so as to avoid unnecessarily wasting thread resources. However, the ability of the Unity job system to breakup data processing into multiply tiny chunks seems like it would eliminate this concern, thus most current solutions can be optimized for use with Unity. Even if that were not true, I think it would be beneficial to discuss parallel sorting directly in context with the job system, as the mechanism for writing parallel code is a bit different with it versus writing parallel code outside of Unity.
So, with that in mind, I have implemented a sorting system via the IJobParallelForBatch interface. It is a merge sort, which from my understanding is well suited to parallelism. Please take a look and let me know what you think!
Code
using Unity.Entities;
using Unity.Jobs;
using Unity.Collections;
using UnityEngine;
using Unity.Burst;
public class SorterSystem : JobComponentSystem
{
int runs = 100, itemsToSort = 10000, x = 0;
int y = 0;//You can comment this out/delete if you don't need to verify that the sort worked correctly
float sum;
//The first sort that is run. Just sorts items in blocks of two. For instance for an array of 5
//2,5,7,3,1
//would sort into blocks of 2,5|3,7|1
[BurstCompile]
struct TwoItemSort : IJobParallelForBatch
{
public NativeArray<int> sort;
public void Execute(int start, int length)
{
if (length < 2)
return;
if(sort[start] > sort[start + 1])
{
int firstValue = sort[start];
sort[start] = sort[start + 1];
sort[start + 1] = firstValue;
}
}
}
//After TwoItemSort has sorted the items in the array into blocks of two, run this job to merge the two item
//blocks into 4 item blocks (batch size = 4, sort threshold = 2), then run again to sort the 4 item blocks
//into 8 item blocks (batch size = 8, sort threshold = 4). Keep repeating this cycle until all items have been
//sorted.
[BurstCompile]
struct MergeSort : IJobParallelForBatch
{
[ReadOnly] public NativeArray<int> input;
[WriteOnly] public NativeArray<int> output;
public int sortThreshold;
public void Execute(int start, int length)
{
//merge sorts are performed on a sorted left and right half batch of data (for instance,
//an 8 batch has a left half 4 items long and right half 4 items long.
//However, it's possible to have one batch each job that is not the full batch length (for instance, imagine you were
//sorting an integer array of length 5, if the partition size for the first merge job was 4, then you'd have two batches,
//one of length 4, and one of length 1.
//Because of this fact, it's possible to have a batch that has a left side (maybe not a full left half though) and
//no right half. In these instances, sorting the batch would be redundant, the left half is already sorted.
//this check makes sure we only perform the sort on batches with a left and right half.
if (length <= sortThreshold)
{
//indicates the items are already sorted
int max = start + length;
for (int m = start; m < max; m++)
output[m] = input[m];
return;
}
//We have two halves, and its guaranteed that the left half and right half
//are already sorted. Because of how IJobParallelForBatch works, we also know that the
//left half of each batch will have the same or greater the number of items as the right half.
//We can use that fact to our advantage.
int end = start + length;
int rightHalfStart = start + sortThreshold;
int i = start, j = rightHalfStart;
for (int k = start; k < end; k++)
{
//is the current item from the right half less than the current item of the left half?
//we put the right half item on the left side of this comparison so that when the two
//items are equal, the left half item wins and is slotted first, thus preserving the
//order of the items. This isn't really necessary when dealing with integers, but may be
//important when sorting other things.
if (input[j] < input[i])
{
output[k] = input[j];
j++;
}
else
{
output[k] = input[i];
i++;
}
//since right half is possibly smaller than left half, this should be
//more likely, so check for it first
//If this statement is true, it means all items from the right half have
//been sorted and only left half item exist. Since the remaining left half items
//are already sorted, we can just do a copy of the remaining items
if (j == end)
{
//k is still pointing to the last sorted item, so start by incrementing it.
for (k += 1; k < end; k++, i++)
output[k] = input[i];
return;
}
//If this statement is true, it means all items from the left half have been sorted and only
//right half items remain. Since the right half is already sorted, we can just copy the remaining
//right half items (indexed by j) into the output.
else if (i == rightHalfStart)
{
//k is still pointing to the last sorted item, so start by incrementing it.
for (k += 1; k < end; k++, j++)
output[k] = input[j];
return;
}
}
}
}
protected override JobHandle OnUpdate(JobHandle inputDeps)
{
if (x == runs)
{
Debug.Log("Average sort time = " + (sum / runs * 1000f) + "ms");
x = 0;
sum = 0f;
}
x++;
NativeArray<int> a = new NativeArray<int>(itemsToSort, Allocator.TempJob, NativeArrayOptions.UninitializedMemory);
NativeArray<int> b = new NativeArray<int>(itemsToSort, Allocator.TempJob, NativeArrayOptions.UninitializedMemory);
for (int i = 0; i < a.Length; i++)
a[i] = Random.Range(0, a.Length - 1);
//Used to measure how long the sort took. This might not be the best way but oh well.
float startTime = Time.realtimeSinceStartup;
bool aIsInput = false;
if (a.Length >= 2)
{
inputDeps = new TwoItemSort() { sort = a }.ScheduleBatch(a.Length, 2, inputDeps);
int partitionSize = 2;
//We are checking to make sure the previous sort jobs partition size is less than the
//length of the data to be sorted.
while (partitionSize < itemsToSort)
{
//switch back and forth between a and b as input (whichever one is input, the other will be output).
//This is done because the sort job takes in an input array, and the sorted values are placed in
//the output (because the input array cannot be overwritten for the sort to work correctly).
//The next job then merges two sorted partitions from the previous step, which are in the output. So
//this next job needs to have the output from the previous step as input. We could copy then output to the
//input after each job, but that is a waste when we can just switch the arrays back and forth between being used
//as input and output.
//We do the reversal before the job is run so after all jobs are run it is clearer which array (a or b) has the
//fully sorted values. If aIsInput is true, b will have the values, otherwise a will have them. This is also why
//we set aIsInput to false initially, since for the first run we want it to be true.
aIsInput = !aIsInput;
//Double the partition size for the next sort job
partitionSize *= 2;
inputDeps = new MergeSort() { input = aIsInput ? a : b, output = aIsInput ? b : a, sortThreshold = partitionSize / 2 }.ScheduleBatch(a.Length, partitionSize, inputDeps);
}
}
inputDeps.Complete();
float timeSortTook = Time.realtimeSinceStartup - startTime;
sum += timeSortTook;
if (aIsInput)
{
a.Dispose();
//uncomment to verify that the sort worked correctly
//if (y++ == 0)
//{
// for (int i = 0; i < b.Length; i++)
// Debug.Log(b[i]);
//}
b.Dispose();
}
else
{
b.Dispose();
//uncomment to verify that the sort worked correctly
//if (y++ == 0)
//{
// for (int i = 0; i < a.Length; i++)
// Debug.Log(a[i]);
//}
a.Dispose();
}
return inputDeps;
}
}
Here I am sorting 10000 integers, randomly assigned. The sort time is measured using Time.realTimeSinceStartup, which might not be the best way, but it was easy to implement so I went with it. This approach yields an average sort time of .5ms. I feel this is probably pretty good, but I really have nothing to compare it to (a non parallel sort, for instance).
Edit: I just discovered NativeArray has a Sort extension method. I went ahead and tested this in an IJob (BurstCompiled of course), and using this method takes around .65ms. So, if you expect to have less than 10000 items, it’s probably easier to use the Sort method, though if you need any performance boost you can get, parallel sorting is the way to go. Also, we can probably push the parallel sorting times even lower.
Please feel free to critique the sorting code, measurement code, or anything else. Is there anything that can be changed to provide even better results? This is my first foray into parallel sorting so it’s very possible there is a better way to do this!
Has anyone else experimented with parallel sorting in the context of the job system? What are your results? Please feel free to post your code. I’d like this thread to serve as a resource for anyone looking to implement a sorting system. This is my first foray into parallel sorting, so it’s very possible there is a better way to do this.