Does using RWStructuredBuffer using as StructuredBuffer has performance penalty?

In many Unity project I see common pattern of forward declaring shaders and buffers and using some buffer in compute shader as UAV and then using it as SRV for read though it is declared as RWStructuredBuffer. This may affect performance due to binding as UAV may cause unnecessary cache flushes.
Is there any way to declare shader in some region with all resources in same shader file? For example
#pragma kernel kernel1
StructuredBuffer buffer1;
RWStructuredBuffer buffer2;

#pragma kernel kernel2
StructuredBuffer buffer2;
RWStructuredBuffer buffer3;

So that resource state would be correct for each shader. This would also greatly increased ReanderDoc readablity.

Right now there’s no way to have the resources declared per kernel in the same shader file.
You could use different names for them.
Like this:
StructuredBuffer kernel1_buffer1;
RWStructuredBuffer kernel1_buffer2;
StructuredBuffer kernel2_buffer2;
RWStructuredBuffer kernel2_buffer3;

Thanks a lot!
But I see that there is an ability to declare define per shader kernel - is it suitable for isolating shader code and its resources?

Yes, but you’d need to have a define per kernel.

1 Like

Sorry to necro this thread, but I’d like more information on this.

I have a ComputeShader with multiple kernels. One kernel writes to a RWStructuredBuffer and another reads from it. I am wondering whether it is beneficial to have a second uniform StructuredBuffer (not RW) for the read-only kernel rather than reading from the uniform RWStructuredBuffer.

Could you elaborate a bit more on the ā€œdefine per kernel?ā€ How would this work?

Hi! There’s some example here: Unity - Manual: Compute shaders

HLSL example:
#pragma kernel WriteDataKernel WRITE_TO_BUFFER
#pragma kernel ReadDataKernel READ_FROM_BUFFER

#if defined(WRITE_TO_BUFFER)
RWStructuredBuffer Buffer;
#elif defined(READ_FROM_BUFFER)
StructuredBuffer Buffer;
#endif

The kernel will automatically enable the keyword when dispatching it.

1 Like

Thank you. Before I update all of my ComputeShaders, I assume that it is indeed beneficial to do this? Do you know of somewhere we can read more about the performance differences between Structured/RWStructuredBuffer? Are there more concerns than performance? For example, can having a ComputeBuffer bound to a RWStructuredBuffer instead of a StructuredBuffer affect the ordering of ComputeShader execution or anything like that?

I don’t know if it will be beneficial four you. Always profile! Use PIX, for example. The shader compiler might figure out you don’t actually write into the RWStructuredBuffer so it won’t matter. The order of shader dispatch depends on your GPU resource read and write dependencies and compute unit availability. When rendering a scene, you can notice multiple draw calls being executed at the same time by the GPU in PIX but this is usually not the case when working with compute shaders. The GPU can parallelize dispatches if there are no dependencies between them and they are small enough to run them in parallel.

1 Like

Thanks again, I am attempting to profile this but can’t get my shader to compile using your example code above. Unity displays an error on the line first accessing the buffer. If you are able, could you take a quick look at this minimal example? Do I need to use the keywords when accessing the buffer also?

The error in question:
ā€œShader error in ā€˜Test’: l-value specifies const object at kernel Copy at Test.compute(18) (on d3d11)ā€

#pragma kernel Fill WRITE_TO_BUFFER
#pragma kernel Copy READ_FROM_BUFFER

#if defined(WRITE_TO_BUFFER)
uniform RWStructuredBuffer<uint> _FillBuffer;
#elif defined(READ_FROM_BUFFER)
uniform StructuredBuffer<uint> _FillBuffer;
#endif

uniform RWStructuredBuffer<uint> _CopyToBuffer;
uniform uint _Count;

[numthreads(256,1,1)]
void Fill (uint3 id : SV_DispatchThreadID)
{
    if (id.x >= _Count) return;

    _FillBuffer[id.x] = id.x;
}

[numthreads(256,1,1)]
void Copy (uint3 id : SV_DispatchThreadID)
{
    if (id.x >= _Count) return;

    _CopyToBuffer[id.x] = _FillBuffer[id.x];
}

Yes you do need to use the keyworks when reading or writing into those buffers. For example when READ_FROM_BUFFER keyword variant is compiled, the code in Fill kernel will produce that compilation error since you want to write into a StructuredBuffer.

Try

uniform uint _Count;

#if defined(WRITE_TO_BUFFER)
uniform RWStructuredBuffer<uint> _FillBuffer;

[numthreads(256,1,1)]
void Fill (uint3 id : SV_DispatchThreadID)
{
    if (id.x >= _Count) return;
    _FillBuffer[id.x] = id.x;
}
#elif defined(READ_FROM_BUFFER)
uniform StructuredBuffer<uint> _FillBuffer;
uniform RWStructuredBuffer<uint> _CopyToBuffer;

[numthreads(256,1,1)]
void Copy (uint3 id : SV_DispatchThreadID)
{
    if (id.x >= _Count) return;
    _CopyToBuffer[id.x] = _FillBuffer[id.x];
}
#endif

No need to use uniform keyword.

1 Like