Hello! While doing an optimization pass, I learned that a Native array sort (using a custom IComparer struct) is one of the slowest parts of my Bursted code. I’m looking for advice about how improve things. Here’s some details about my test case:
I’m iterating over 10,000 entities over 8 threads in parallel, using one Entities.ForEach job.
Each entity has a buffer with 40 elements that each need to be sorted.
I’m calling the NativeArray.Sort() extension, and passing in a custom IComparer struct to do the sorting.
The Compare() method uses a multi-step if block to do the sorting. I thought this might be a target for optimization, though removing all of these blocks (and taking the result with a simple int comparison) had seemingly no effect on performance.
Removing the Sort() call altogether shaved 8ms off of my frame time.
I am surprised that 10k sorts of 40 elements is a bottleneck, and it suggests I’m doing something wrong. I’m hoping there are good steps that can be taken.
Anyone else have luck optimizing their Sorts? There are a few other threads here about sorting, but the solutions mostly deal with no bursted code.
With 400,000 elements I would suspect you are bandwidth-bound. How big is the IBufferElementData? You might gain some time sorting ranks and then applying them if the element size is big.
Without knowing much about your problem, change-filtering is the first thing that comes to mind as an area to get things smoothed out.
I could be wrong here - I ruled out IJobParallelForFilter, because order is important here. I’m not sure how to get away from comparing the elements to each other. Please call out if I’m wrong.
To eliminate needing to move data in the NativeArray itself, I’m considering generating a second, NativeArray which holds sorted indices to the source array. Maybe I could compare elements in the source array, without needing to rearrange them…
Then use the sorted indices array to randomly access the source array…hmm, I wonder if there would be a hit there.
Not what I am talking about at all. Does the buffer need to be resorted if elements in the buffer were not touched for a particular frame? If not, then you can use a change filter on your buffer type to only sort buffers in the chunks that were potentially modified.
This should be part of your sorting routine if you struct is large. Do an initial pass to encode the struct’s sorting parameters into an integer key. Then sort the key-index pairs. And lastly, use the key-index pairs to rearrange the buffer. Unless your buffer is initially nearly sorted, this reduces the number of swaps of buffer elements required.
Unfortunately, I am already filtering for this in the OP.
I believe this is the best candidate for what’s going on, but I don’t have a good benchmark to know if my struct is large enough that problems would be expected. My struct holds:
5 ints
1 Entity
1 enum value
Is that a small size for sorting? Large? I don’t know - Do you?
Thank you both very much. Looks like I have my work cut out for me now.
just to remove confusion, I want to specify that I’m not sorting an array of 400k elements. I’m sorting 10,000 arrays of 40 elements each, spread across 8 threads. Each thread is doing 1,250 of these sort operations in parallel.
Maybe make a custom sort implementation? I did that for my own system and got quite a few milliseconds off by specifying exactly what to sort and how. I tried various but QuickSort worked the best for my case (20-100 elements per array, just float2s).
Thank you Nayanpas! I had run across your posts in an earlier sorting thread, but at the time your quicksort wasn’t Bursted (or at least, you had mentioned you weren’t using Bursted code).
did you ever get it working with Burst? Were you able to do comparison tests between different sort approaches with Bursted code?
I am not there yet because I believe in first making optimised code and then moving it to burst when even the regular monobehaviour is fast enough.
That being said, it’s all done using Unity.Mathematics and in static methods so putting it in a job shouldn’t be a problem. I just wish I wadn’t doing it in my spare time or else I’d be done with this by now.