Hi,
I reported the following issue 1311418 regarding the use of ComputeBufferMode.Dynamic not functioning as intended on D3D11, and got the following response: [quote]
Immutable buffers should be used. It’s a doc and naming bug. Immutable buffer is actually the most mutable of them all. Dynamic buffers are meant only for meshes etc, as they’re CPU visible. Users shouldn’t need to change the buffer mode at all. The default one works on C# side. We’re delaying the doc fix for now.
[/quote]
I’m wondering if I could get some clarification on how ComputeBufferMode.Immutable functions, as it apparently is not immutable as the documentation states. I’m also wondering how we can use ComputeBuffer.BeginWrite in D3D11 if we’re only supposed to use ComputeBufferMode.Immutable?
ComputeBufferMode is not really meant for external users to change. While the default value in the constructor is called Immutable the name is a huge misconception. What it actually means internally is basically as a buffer in OpenGL would work (so everything goes). The dynamic and others are for internal things, like keeping mesh in CPU accessible memory etc.
If we really twist the wording of the docs “Compute shaders and other GPU operations are allowed to modify the contents of the buffer.” well, upload operation is a GPU timeline operation performed by GPU so yeah, SetData works just fine
ComputeBuffer.BeginWrite is only for the subupdates buffer mode. It’s currently used by the hybrid renderer. The idea is to have mapped buffer that one can write directly into from Burst jobs, saving a memcpy. If you’re not doing something like that then just the SetData is fine.
However if you do use it remember that you cannot write into an area that’s still used by the GPU. And we have no way of asking from C# side if we’re done with specific frame. So it will work with just pure luck.
Until we get around to actually making externally usable buffer modes, which I hope we get a chance to do sooner rather than later, it’s better to just ignore the parameter and just use the default.
So, outside of being able to use ComputeBuffer.BeginWrite there’s no difference in upload speed (after CPU modification) between the different modes as the documentation suggests? It doesn’t act like the Static, Dynamic, or Stream hints in OpenGL (or other rendering API’s) about what sort of memory should be used?
One of the things I want to use ComputeBuffer.BeginWrite for is to modify the bufferWithArgs that’s used in CommandBuffer.DrawProceduralIndirect. We do a lot of indirect drawing with bufferWithArgs that’s created on the CPU. The way we do it now is to have a CPU version of bufferWithArgs that we work on in jobs, and once we’re finished working we upload it to the GPU using ComputeBuffer.SetData. My hope was that we could work directly on the memory using ComputeBuffer.BeginWrite so we could avoid the memcpy of ComputeBuffer.SetData on the main thread.
From what you wrote regarding not knowing if the GPU memory is in use or not, this doesn’t sound like something we can achieve with ComputeBuffer.BeginWrite at the moment? At least, not without introducing more complexity like compute shaders and potentially some GPU staging buffers?
Does this all mean that ComputeBuffer.BeginWrite is something that isn’t really ready to be user-facing yet? Like, are there any use cases for it outside of what’s done in the Hybrid Renderer?
Lastly, if I remember correctly (it’s a while since I wrote this issue), the sub updates buffer mode “didn’t work” in D3D11 either?
There is a big difference with the SubUpdates buffer (not Dynamic, for most that’s just a normal buffer except on D3D11) is that it’s CPU mappable memory in most cases. As an example in Vulkan it’s just host memory that GPU can read directly. Reading from it is slower on GPU but if you need to perform an upload every frame regardless it can make sense to use it.
Unfortunately you can’t be safe with ComputeBuffer.BeginWrite at all for now. We will bring an API Sometime™ in the future to do the syncing, but I cannot promise anything definite. If you, and preferably many others, pester us about it enough it will tell us that there is demand for it so the priority can be raised.
SubUpdates works perfectly with D3D11, but within it’s limitations. As in the only way to write into subupdates buffer is via ComputeBuffer.BeginWrite, sure on some API the SetData can work, but as we don’t have a validation layer yet we can’t really tell on runtime if it’s accidental working or not.
I see. Thanks again for taking the time to properly clarify this, it’s very helpful
Out of curiosity how is Dynamic different on D3D11 (that’s the sole API we’re targeting) from the other buffer modes on D3D11 or the other API’s for that matter.
Also, to summarize so I understand all information correctly
ComputeBuffer.Immutable is not immutable and does not only allow for initial uploads as per the docs, but rather it’s the one you essentially always want to use (even if you call SetData every frame)
Dynamic shouldn’t be used for data that’s frequently modified by the CPU through SetData or BeginBufferWrite (which I assume was renamed to BeginWrite) as per the docs because it doesn’t “work as expected” in all APIs and doesn’t work at all with BeginBufferWrite
Dynamic is not stored in GPU-visible CPU memory as per the docs (from my understanding of your clarification, only SubUpdates is stored like that)
SubUpdates can be used in D3D11 but can only be written to using BeginWrite, even though the documentation says that SubUpdates is “Same as ComputeBufferMode.Dynamic except Unity does not perform any CPU-GPU synchronization” so one would expect to be able to use SetData
There is no safe way to interact with BeginWrite, even though the documentation says that you can use it together with GraphicsFence to implement circular buffers
D3D11 can’t use Dynamic, and can’t safely use BeginWrite, so there’s no way to get any upload speed boosts or anything by specifying the mode
Even though the docs state that “Use this enum to convey the intended usage of the buffer to the engine, so that Unity can decide where and how to store the buffer contents.”, strongly hinting that picking the correct mode will impact performance, users shouldn’t ignore that the parameter and enum exist, and only use immutable
Might be that I’m misunderstanding something, but my understanding is that almost all of the most important parts of the documentation on ComputeBuffer are flat out wrong?
D3D11 Dynamic becomes D3D11_USAGE_DYNAMIC which incidentally is also what SubUpdates mode becomes too on D3D11. So it’s write only on CPU and read only from GPU. The dynamic was made specifically to use the D3D11 dynamic buffers. Incidentally the Immutable was made for same reason (D3D11_USAGE_IMMUTABLE) but something went wrong. It’s just a naming issue that ought to be internal only but unfortunately it has leaked a bit.
Correct. Do note that you get Immutable if you just create a ComputeBuffer without specifying any specific mode.
True, it cannot even be used with SetData. Dynamic is writeonly from CPU and readonly from GPU, and SetData is essentially a GPU timeline copy. You need to map in order to write into Dynamic, which there is no other API than the BeginWrite.
There is no such memory in D3D11 unlike Vulkan, DX12 and Metal. But D3D11_USAGE_DYNAMIC comes the closest so it’s “emulated” via that.
Because it’s D3D11_USAGE_DYNAMIC the only way to write into it is map, thus SetData doesn’t work. If we would want SetData to work we’d need to map it at SetData and then perform memcpy, meaning it would have the perf penalty of memcpy and also GPU read speed penalty, giving worst of the both worlds.
That is true. It’s because we don’t actually have a GraphicsFence that can perform CPU-GPU syncs on the public API. It’s Coming™, but as usual I can’t promise any specific timeline.
That is correct. But if you are willing to be brave you can try the SubUpdates and then just fingers crossed hope that perhaps 4 frames is enough buffer. Just please don’t come demanding my head on a stake if it breaks :'D. And most importantly test the perf impact!
Yeah. That’s why it’s “hidden”, well actually just not released yet, as it’s a feature that was not exposed completely. Because the enums are not well named, and because they also give limitations, such as not being able to call SetData. Especially as we don’t have any sort of validation layer yet so doing something might work on platform X by accident but break on platform Y. The default mode is guaranteed to work the same everywhere.
You are correct that the documentation is wrong. But it’s not anything users are meant to change for the short term. That’s why the ComputeBuffer constructor doesn’t even tell that there is the secret ComputeBufferMode that can be passed there. It’s not ideal but it definitely will be fixed when we get around to exposing that.
Rest assured we are well aware of these and do understand the annoyance users have with these. Personally I just hope we’d have time sooner rather than later to get around to fixing this particular can of worms.
Ah ok, now I see what you mean with hidden, I found the ComputeBufferType randomly through the documentation, but I see now that the specific constructor taking it in isn’t even listed on the ComputeBuffer page. Sorry, I was under the impression that this stuff was more public and finished than it apparently was
Really looking forward to whenever all this stuff is properly finished, released, and documented. I think it’s always good (and fun) when we get more fine-grained access to more low-level stuff like this (even with limitations, as long as its properly documented etc)
I see, does this have to do with triple buffering and v-sync (since you mentioned 4 frames specifically)?
Would waiting for more frames potentially be safer? Like say this scenario:
Write data to a staging buffer through BeginWrite
Copy data from the staging buffer to a second buffer using a compute shader or something (the second buffer is the one actually used for rendering)
Wait for 10 frames (assume the copying will be done by then)
Goto 1
If so, is there a number of frames we can wait where we can essentially guarantee that, unless the planets align, it’s safe to write to the buffer again? (Knowing of course that this stuff is unsafe, and I promise I won’t demand your head on a spike if it blows up spectacularly :P)
Partly. On Vulkan and Metal we can give internally a guarantee that we won’t go over a certain threshold (but that’s subject to change and users cannot really query it so I better not say it lest people start using it locking us in). And some platforms follow the Quality settings and some don’t, so it’s a mess but understandable mess as it was written before any of this was even considered. On DX11 there is no real way of doing it at all, except with DX11.3 fence which is not available on any older DX11.
More frames is definitely safer, but 10 is definitely overkill and you’ll likely just waste a ton of memory doing it. Even so if you do go to the unsafe territory in DX11 the only thing that happens is graphical glitches, so if you are not actually doing any gameplay or other functionality you just get some new data being used instead of old data when drawing.
But most importantly it should be profiled. If you’re just doing stuff on the mainthread it’s likely that you won’t see much improvement if at all. The big reason why Hybrid uses the BeginWrite is because it writes from Burst jobs. And doing normal setdata would effectively serialize all that into the mainthread ruining the whole thing. So pointers are passed to jobs and they just write stuff into GPU memory. That’s where you’ll see some real gains. If your game can work with that (Burst is available in normal Unity and usable now) you’d likely see some major gains. Just write it considering that there will be a way to query either how many frames is safe (unlikely) or just query if certain frame is complete (more likely) and you’ll be good to go when we get that part out.
I see. Is the problem “just” that the GPU can end up seeing newer data, or can it see a partial mix of old and new data? Like, is a whole buffer updated atomically?
If the GPU can see partially new data, can I make any assumption about atomicity at all? For example, can I assume that aligned 32-bit or 128-bit values are updated atomically and that I can’t see partial writes on those bit widths?
Right, my main plan was to use it to stream data from disk onto the GPU, similar to what I wrote above. Read some geometry data from disk, store it to a staging buffer using BeginWrite, copy from staging buffer to a geometry buffer (used for rendering), wait some frames for safety then start reading more geometry data from disk. For our application waiting a couple of frames between reading stuff from disk is completely fine. Also, the data is not used for anything other than our own rendering.
The other stuff I’m considering using it for depends on the atomicity above. If it turns out that the whole buffer is updated atomically I might use it for setting up some commands for dispatching indirect rendering.
In both of these cases I either plan to, or are already, going wide using jobs and burst, and the thing I’m trying to avoid is the serialized main thread call to SetData. But yeah, serious profiling is planned for all of this stuff
It’s partial data. But even this is already against the spec (what little spec DX11 even has). No guarantees of atomicity can be made. I would guess that 32bit would be atomic, but wouldn’t be too surprised if it’s all just garbage.
The plan ought to be modified like this:
Write to the staging buffer using BeginWrite, issue copy to the geometry buffer, draw using that buffer, wait several frames to reuse the staging buffer. Because the copy happens on GPU timeline and that is fully synced. The unsafe part is that when GPU timeline commands are done with the data. So you need to wait for the copy command to finish in order to reuse the staging buffer.
To be super precise you don’t even need to write to the staging buffer when you issue the copy command, only when the copy command is actually running on the GPU it must have been written into. But this is something that cannot be reasonably achieved in practice. So better to have the data there ready for it before you even issue the copy command.
Right, I see. I think I have a better idea of how the compute buffer stuff can be used in my situation now. Thanks for clarifying all of this, you’ve saved me a lot of experimentation, frustration, and debugging
Then I suggest putting that in the documentation. Or at the very least, mark it as an experimental feature.
No, it does. It has been exposed since version 2021.2.
I can’t understate how much confusion Unity’s documentation has caused us because of this.
Anyway, many thanks to Per-Morten and tvirolai for the in-depth discussion. Maybe the docs should just link this thread instead. Right now it is the only useful source of information on the subject…