Hi all, sorry if posting to the wrong forum, I couldn’t find a more appropriate one. Also it’s a big’un. I promise there’ll be a joke or two.
Bit of context :
My app is displaying very large point-cloud data that is selected by users at runtime. I’m rewriting the rendering of these pointcloud to do as much of the work off the main thread.
I have an approach that works pretty well, where I maintain a very large compute buffer with “slots” that hold the compressed data of nodes currently in the frustum, then have one call to Graphics.RenderPrimitives(…); draw the whole pointcloud in one drawcall. The vertex shader decompresses the point data, and a geom shader emits a quad. It’s great.
Using Unity 2021.3.7f1 at the moment.
The problem at hand :
The bottleneck of this approach is of course copying arbitrary data to the large compute buffer, in small batches that go to different offsets.
All the documented ways to get data on the GPU that I found have some dependency or other on the main thread, which I really wish I could sidestep, as the display algorithm already deals with data being “on its way” in a threadsafe way.
The things I tried :
-
bigBuffer.SetData(bytesReadOffThread, offset, size); Works great, chokes the main thread
-
Pairs of bigBuffer.BeginWrite<>(); and bigBuffer.EndWrite<>();
-
if I write data on the main thread : Works great, chokes the main thread a little less.
-
if I off-thread the returned NativeArray to copy the data then endwrite on the main thread : blazing fast (well, fast)! but I can only have one copy operation per frame
(can’t call beginwrite multiple times) -
Updating to Unity 2022.3 ( that… didn’t go well (link)) and shifting to GraphicsBuffers so I can use a combination of GraphicsBuffer.LockForWrite and Graphics.CopyBuffer(). ლ(ಠ益ಠლ)
-
current approach and bug below
The bug I’m encountering :
My current approach is a hybrid one where I have a multitude of in-flight BeginWrite<> operations to smaller compute buffers, that I call “source”, and then use a very small compute shader to copy those to the main one. Here is the compute shader (insanely simple). The small buffers are part of a pool that I acces in a thread-safe way.
#pragma kernel Copy
RWStructuredBuffer<int> destination;
StructuredBuffer<int> source;
int dataOffset;
[numthreads(64,1,1)]
void Copy (uint3 id : SV_DispatchThreadID)
{
const uint idx = id.x;
destination[dataOffset + idx] = source[idx];
}
This works reasonably well, until at some point, the big compute buffer gets wrong data. Specifically, it seems to get a copy of later data written to the same “source” small buffer on later frames. I was convinced that a computeShader.Dispatch()
call would always finish on the current frame but it seems they can span large timeframes. I can’t for the life of me find a way to enforce coherence, or at least get notified when coherence is ok.
Mitigation :
-
a call to source.GetData() seems to enforce coherence, at the cost of speed (a lot)
-
Putting the source buffers on “timeout” (about a second) for some time seems to avoid the problem but it is very brittle, and it means having a lot of them hogging resources. For context, “bigBuffer” is 2GB, and sources are 200KB (for now, but I need to tweak that at some point).
-
using a brand-new source buffer for every iteration. This works, but it seems like a waste of resources (I haven’t profiled yet, but all these new() make me uncomfortable). Also, I need to put the buffers on timeout for an indetermined amount of time between Dispatch() and Release(), otherwise the bigBuffer receives all zeroes. (╯°□°)╯︵ ┻━┻
Question(s) :
I’m willing to revisit my approach, so is there an API I didn’t find that would
- let me update my bigBuffer from off the main-thread ?
- enforce coherence on a ComputeBuffer after a bunch of computeShader.Distpatches (some kind of fence maybe? can’t find the proper docs as this(link) is pretty terse and I need an example)
Here are things I thought of but haven’t tried yet :
-
messing with the internalPtrs (somehow) so that the source buffers actually are “views” into thebig one
-
writing a dll that does the whole thing, bypassing Unity’s thread locks where necessary
-
Using something like Graphics.CopyBuffer (but GraphicsBuffers don’t seem to have any fast/offthread way to upload data )
-
asking the forums ಥ﹏ಥ
-
edit : typos
Obligatory trail of messed-up syntax that appeared out of nowhere while editing this behemoth!
ಠ_ಠ -

