Hey, I’m just wondering if anyone has worked on a Burst compiled DOTS generic Quadtree? (Insert, Remove, GetAllObjectsInRect, Clear… nothing out of the ordinary.)
I’m often wanting to track where different objects are spatially, so I can look-up nearby objects quickly. Since objects move around, I’m constantly chucking out-of-date Quadtrees away and rebuilding them from the latest set of positions… and this seems to be a perfect use-case for ECS and Burst and Jobs.
I’m checking if anyone has had a crack at this already because while I’ve done some work with DOTS, I’ve a feeling that a properly implemented DOTS Quadtree has to implement it as a Native Collection, ala Jackson Dunstan’s work, something that is a bit beyond my understanding of the underlying systems right now!
If anyone is working on the same problem and wants to get particularly fancy, I would also take a look at this recent paper from the University of Idaho which investigates parallel construction of quadtrees across multiple cores. Some clever ideas there, and it all seems ripe to be Jobified.
(Antypodish was working on an Octree which is a little more complicated than I’m looking for, concerned as it is with ray intersections. Taking a look at the code itself, I’m also spotting [Inject] labels, so it’s using the old API.)
I appreciate anyone sharing their progress in the same problem space!
Unity.Physics has a custom BVH implementation for collision and spatial queries. You can easily make a QuadTree : IDisposable and a Node and have a NativeArray<Node> or NativeList<Node> as an internal member variable of the QuadTree. Attributes might be a little quirky, but it will at least unblock you.
I can’t help you much with the actual quadtree implementation though. I’ve always used box pruners for this type of problem, which I find more intuitive to implement in a jobified manner.
@Matt-Cranktrain hi, thx for mentioning
If you use at the front of nick name, I would be even notified promptly.
Just luck I was passing by.
Yes I used Entities package, when Injection was a thing. I haven’t updated it publicly since then. Sorry for that.
I have learned few thing however, where I could improve in future.
Also, possibly you have find out, I use entities buffers. That specific requirement toward my project.
But indeed, it may be a bit overkill for what you need.
What I can suggest, look into github, and search for quadtrees in Unity/C#. I think I have see some. But haven’t looked for them specifically. In worse case scenario, look in other languages.
Building quadtrees is possible in multi threading approach, but I think main challenge is, how to arrange data allocation, across threads, to avoid race conditions.
Another matter is, do you really need multithreading, for building quadtree? Is that something you expect 1k+ inserts / removes per frame on multi depth? I don’t know your application of course, however, would simple grid be insufficient in your case? I am asking, as if you seen this tech talk vid with minions battle., and it may fit your requirements?
Unite Austin 2017 - Massive Battle in the Spellsouls Universe
Is this exposed in the Physics API at all? I’m not an expert on all the different methods, so perhaps I’m missing something.
Do you have any favourite documentation/links on multi-box pruners? I know Quadtrees pretty well, but Broadphase physics stuff is not a thing I’ve ever had to look into.
@Antypodish , I think your most important question is:
I do have ~1000 objects that are moving around, so I have to rebuild the Quadtree regularly, and I have to do spatial look-ups every frame.
My current solution is to build the Quadtree on another thread using, of all things, CeilaSpike’s Thread Ninja (a solution I settled on years before ecs/DOTS was a thing!) The advantage is there’s no hit to my framerate, and if Quadtrees take a couple of frames to build then who cares because the accuracy is close enough for it not to matter. The disadvantages are: garbage collection overhead when the whole thread is suddenly recycled. Also, I’ve no idea how cross-platform that old Thread Ninja asset really is! It’s a solution that works for me, but it would be a shame not to make use of DOTS here.
Perhaps, as you say, an improvement would be a simple look-up grid built in DOTS with. It might be slower than a DOTS quadtree, but faster than what I have right now.
Thanks for the suggestion, something for me to think about there!
The Quadtree implementation I’m currently using is actually this one from Github, but with a few little changes to reduce Allocations and garbage.
Thanks for the thoughts guys, I’ll have a think about a DOTS-based grid/bucket system, and if I can prototype something then perhaps I can also share the performance improvements in this thread too!
I read briefly in to this quadtree.
Few thing you can do. First concentrate to allow burst.
Second, leave it for now in single threaded. Just use job. It should allocate spare thread automatically.
Then you need replace all vectors to floats.
And then, figure out, how to take out List, and replace with an array.
You may want preallocate array size.
List .Add and AddRange, is what kills its performance here.
Alternatively you could use buffer array of entities, if you want to go that route.
Buffer is more elastic, when comes to resizing. But make sure, you resize it as few times as possible. Specially when capacity / size approaches magic thresholds of 2,4,8,16,32 … 1024 etc.
If you got array big enough, you just play with index offset, to allocate new elements.
Also, in linked code, I see m_storedObjects.Clear(). I in fact don’t use clear at all in my octree. Just simply resize unallocated length to 0, or set index to 0. Whenever suitable. That is not a problem, as new indexes and data will be overitten, when next indexes of array/buffer will be used.
In my octree, I have fixed size of allowed element per each node. So I don’t need resize arrays, as in above code example. I run each octree creation / modification on single thread. But I can run multiple octrees in different threads.
However, I can search octree on different threads, when no modification happens.
The queries are exposed. The BVH builder is more internalized (though the source code is all visible). It rebuilds the entire BVH for dynamic entities every frame, but it also builds that BVH pretty fast.
Conceptually, box pruning is just a specific method of the Sweep and Prune (also referred to as Sort and Sweep) where by sorting your data spatially, you can then identify nearby objects by simply iterating adjacent objects in the sorted list. You typically sort all three (or two) axes in separate lists and use some scratch table, compress the spatial axes into a single parameter using a z-order curve, or just sort and iterate one axis and brute force the other two (or one) axes, which is what box pruning usually refers to.
Implementation-wise, there’s lots of optimization tricks presented in this series: http://www.codercorner.com/blog/?page_id=1632
Many of those tricks also can be applied to Burst, and fortunately the code looks a lot cleaner in C#!
What most people forget about box pruning is that you can “inflate” the search range by adding and subtracting constants to each min and max.
If you really have lot’s of moving object, I would not use a quadtree as it is not build for adding/removing objects every frame. Go with a spatial grid, easy to DOTsify and also much better for objects which move around.
I haven’t looked into the latest DOTs examples, but the boids example had a very clever spatial position implementation about a year ago.
That said, for a lot of static objects it might be still an option, but the data structure of quadtrees is by nature not very favourable for multithreading imho.
For anyone else wanting to have a crack at a Quadtree, a Quadtree represented entirely in a preallocated array is ‘Linear Quadtree’, that’ll help with Googling.
I’ve been investigating this with two comparable test, one with my existing Quadtree, and other with a simple DOTS solution. First, here’s a Quadtree with 10,000 cubes in it, which move around the available space:
Red lines show the calculated Quadtree. Clicking anywhere does a Cube lookup within a circular radius of the clicked point and then… sets all the Cube’s X position to zero, because that was the easiest thing to do to prove it was finding the right cubes! The Quadtree is being recalculated every single frame.
Next I tried a straight-forward DOTS setup with every cube’s position hashed from a grid. Again: 10,000 cubes, red lines are the debug of the grid:
Implementation details: I looked into how the Boids demo was doing the position hashing, liked what I saw, and copy/pasted it:
public static int GetHashedPosition(float2 position, float cellRadius) {
return (int)math.hash(new int2(math.floor(position / cellRadius)));
}
[BurstCompile]
[RequireComponentTag(typeof(NewQuadtreeThing))]
struct HashPositions : IJobForEachWithEntity<LocalToGrid> {
public NativeMultiHashMap<int, int>.ParallelWriter hashMap;
public float cellRadius;
public void Execute(Entity entity, int index, [ReadOnly]ref LocalToGrid pos) {
var hash = GetHashedPosition(pos.Position, cellRadius);
hashMap.Add(hash, entity.Index);
}
}
My game is not pure-ECS, so there is no doubt some overhead as I’m boxing up the position into my LocalToGrid entity, and then at the lookup end I’m performing a Dictionary lookup via EntityID to get the results from the hashMap. This click-test to lookup all Cubes within a radius would be muuuch faster when jobified and Burst compatible, I am sure, but I’m not in a position to rewrite my game with DOTS from the ground up, so this is a hybrid situation.
Results Time! How did these two similar tests compare?
Building the Quadtree (non-DOTS) is about 200 to 250 times slower than building the hashmap of hashed positions within the grid (DOTS). Building that hashmap across four processor cores is, for my purposes, essentially free to compute, even with thousands of entities. DOTS is fast, who knew! The added benefit is the nice disposing of collections within the System leads to a bonus performance increase from the lack of nasty garbage collection happening on Quadtree instances.
But! Lookups are three times faster in the old Quadtree system than in my new Hashmap system. (Again: a DOTS-based lookup would be faster.) So it’s a trade-off. Do I want lightning-fast spatial map building, but slightly slower lookups? I mean, yeah, probably. But I can imagine situations where a game would need the Quadtree solution instead.
You know what would be stunningly fast? A DOTS-based Quadtree. Aaaand we come full-circle back to the beginning of the thread! I still think that University of Idaho paper I linked in my first post is where the cutting-edge of the parallel construction of spatial lookup structures is, and someone who knows the DOTS API much better than I do could absolutely kill my benchmarks with a sensible implementation of that.
I’ve spent all the time I can on this problem right now, I’m going to bring my new DOTS-Hashmap-Grid system into my proper game and see what performance increases I experience in the Real World. If I can, I’ll report back what I find from my real game.
This is nice brake down of the subject.
I am really curious, if initial implementation in OOP, after conversion to DOTS and burst it, could lead to potential 100x improvement. Even on single thread. In the end, burst is amazing
But that just working on arrays, or relevant, rather hashmap.
What time duration is taken, for each case? We assume building duration in editor is excluded from timing.
In our game we build hierarchial spatial hash map, it’s builds faster than quadtree and lookups faster than one layer spatial hash map (cause it divide more granular and we can discard big empty areas earlier). We decided use 4 layered spatial hash map. with cell sizes 40x40 20x20 5x5 1x1. It feels like something middle between quadtree and spatial hash map, but getting good from all of them. We use it for enemy search and it works pretty well.
Wow, this is looking really impressive - Bulk inserts of 20,000 elements in 0.7ms, a thousand lookups in 1.5ms… looks like you’re getting some ridiculously excellent performance already. I’m really looking forward to seeing more!
No updates really, the content of that post you quoted is still my experience with the partition system I describe, and it works great in my actual game too. It does what I need!