Depth Sorting of Billboard Particles, how can I do it?

I have 3D Particles, I am trying to find a way to sort these particles by Depth, so that they're rendered in the correct order(from back to front);

But.. having 10,000 particles, tends to slow sorting down, naturally..

Anyone know of any fast Depth Sorting algorithms?

Or how about a way to use ZWrite, which appears to effectively sort them automatically, but has clipping issue, because of a particles texture transparency, and the order at which the particles were drawn.

Enabling ZWrite has sped things up ALOT, to bad for the clipping issues.. Using a sorting algorithm, I see no clipping issues, but then have no need for ZWrite, and so its still simply a problem of proccessing speed.. I've looked into GPU sorting, but I'm not sure if it can be done with the Indie version of Unity3D;

So.. I'm just stuck now, trying to figure this out.. and getting no where, anyone have any suggestions?

Sorting: The best sorting algorithm for particles is often Bubble Sort. Yes, the worst naive approach to sorting actually has a practical application! Here's why:

1. It monotonically approaches the correct order. Unlike Quicksort, Bubble Sort doesn't do any crazy shuffling. This means that you can limit its running time per frame and your array will still be in better order than when you started.
2. It runs very well in the best case. For a sorted array, Bubble Sort has a O(n) running time. This is great, because usually your particles were already sorted in the previous frame. With pre-sorted arrays, Quicksort still runs in O(nlogn), or O(n^2) if it's a naive implementation. Up to a certain degree of "almost" sorted, Bubble Sort can still outperform Quicksort. Particles are usually mostly sorted, so you usually get best or almost best-case scenarios.
3. It runs in-place. Quicksort can be written to do this too, but a lot of faster sorting algorithms need a bunch of extra memory to run.

I recommend setting up bubble sort with a limit of how many passes it can do per frame. Perhaps use Quicksort for the first frame so you have a good starting point. For 10k particles, let Bubble Sort do 5-10 passes per frame. You can trade robustness for performance really easily by tweaking the number of passes per frame. Even if your particles aren't perfectly sorted, they'll often look fine.

Z Buffering: For transparent particles, you don't want to use Z writing. Z writing is for when you want a rendered object to occlude things rendered later. Even if all your particles are on top of one another, you always want to draw all of them.

After all that: I should mention that your frame rate problems are probably not due to sorting or vertex processing, but fill rate. To test this theory, try generating an equivalent mesh just once and seeing how it performs with 10k, 20k, 30k particles. Just use a normal particle shader that doesn't do the billboarding itself, and uses the same blending coefficients. My guess is that your performance will drop immensely anyway.

Well sorting 10k particles is pretty much a folly and rarely ever needed. Normally the sheer number of particles and using blending gets around the issue. Are you sure you need sorting? What are the particles you are rendering?

Off hand the only alternative I can think of is to use the same system that vegetation/trees often use in games these days, which is to use alpha testing.

Alpha testing allows you to write into the depth buffer (thus giving you some sorting), but unfortunately its a boolean operation. The pixel is either drawn or not, there is no in-between, like alpha blending. This is often minimised by simply re-drawing the particles a second time, this time with zwrite off, alpha testing off and blending on, to give softer edges.

However it wont work for blended particles as you wont get the same levels of over-draw, which you normally need with particles to build up density as the depth buffer will prevent particles further away being rendered.

Its important to note that this method cannot guarantee perfect sort order appearance, but its usually good enough in the case of vegetation not be too noticeable, whilst at a greatly reduced cost compared to sorting polygons.

ANswer 2: Actually thinking about it, you could probably use a billboard system type shader, something that takes just a single vertex per particle to define its position in space. You can build the billboard from that and some other data (view direction etc). So then its just a case of sorting 10k vertices and how to pass that data into the shader.

Although it sounds a lot I'm guessing it shouldn't be too bad if you find the right sort algorithm and make use of temporal consistency, where by you should only need to sort every few frames as the view position to the particles doesn't change that greatly between single frames.

The best algorithm for already sorted arrays is usually InsertionSort, It has lower overhead than bubblesort. Also Heapsort is really great and many particles engines uses only heapsort.

I’m my current implementation I’m using exactly both algorithms. The nice thing is that Particles usually don’t spawn all at once, but are added gradually so I mostly using InsertionSort (I profiled and turned out to be the best).I Always keep less than 1k-2k particles, if Insertion sort perform worst case for 1k-2k the little freeze down will not be noticeable and the next frame the array is already sorted and the freeze will not occur again. In my tests heapsort was faster (also than quicksort) but ended up to be not necessary in my app.