The issue:
On AMD radeon Compute shader writes data with incorrect offset.
Question - why does it happen?
Is it possible to prevent it somehow, if i want to use unity’s struct GraphicsBuffer.IndirectDrawIndexedArgs?
On the other note unity itself in underlying code stores data for this struct as array of uints and reads it with offsets. So i solved the issue with exactly same approach.
Bug appearance:
Amd(+amd cpu) - Editor - yes
Nvidia(+amd cpu) - Editor - No
Amd,Nvidia - Webgpu - No (surprizingly)
Amd machine Specs
=== Graphics Device Info ===
Graphics Device Name: AMD Radeon RX 6800S
Graphics Device Type: Direct3D11
Graphics Device Vendor: ATI
Graphics Device ID: 29679
Graphics Device Vendor ID: 4098
Graphics Device Version: Direct3D 11.0 [level 11.1]
Graphics Memory Size: 8136 MB
Graphics Multi Threaded: True
Graphics UV Starts at Top: True
Max Texture Size: 16384
NPOT Support: Full
Max Graphics Buffer Size: 2147483648
=== Shader Capabilities ===
Compute Shaders: True
Geometry Shaders: True
Tessellation Shaders: True
Ray Tracing: False
GPU Instancing: True
=== Render Target Capabilities ===
Supported RenderTargetCount: 8
Supports 3D Render Textures: True
Supports Cubemap Array Textures: True
Graphics Format Support: True
=== System Info ===
Operating System: Windows 11 (10.0.26100) 64bit
Operating System Family: Windows
Processor Type: AMD Ryzen 9 6900HS with Radeon Graphics
Processor Count: 16
Processor Frequency: 3294 MHz
System Memory Size: 31980 MB
=== Unity Info ===
Unity Version: 6000.0.27f1
Platform: WindowsEditor
Build GUID: 00000000000000000000000000000000
Genuine: True
Genuine Check Available: True
=== Quality Settings ===
Current Quality Level: 0
Active Color Space: Linear
Anti Aliasing: 0
Anisotropic Filtering: Enable
Shader code + renderdoc decomp
// Compute Shader to write instance count to indirect args
// counterpart of GraphicsBuffer.IndirectDrawIndexedArgs on cpu side
//INDIRECT_ARGS_BUFFER is RWStructuredBuffer<IndirectArgs>
struct IndirectArgs
{
uint idxc;
uint instanceCount;
uint a;
uint b;
uint startinstance;
};
[numthreads(1,1,1)]
void FillIndirectArgsOnStruct(uint3 id : SV_DispatchThreadID)
{
//doesnt work on amd because of some struct alignment
INDIRECT_ARGS_BUFFER[id.x].instanceCount = INDICES_COUNTER[0];
}
For call with 2 threads i got counts written into
first element, field indices 1 and 2 (instancecount, and field ‘a’)
On the gpu side this struct is stored properly with 20 bytes stride. and initially populated correctly. with graphicsbuffer.setdata
*here is shader capture from renderdoc on amd
cs_5_0
dcl_globalFlags refactoringAllowed
dcl_uav_structured u0, 4
dcl_uav_structured u1, 20
dcl_input vThreadID.x
dcl_temps 1
dcl_thread_group 1, 1, 1
0: ld_structured_indexable(structured_buffer, stride=4)(mixed,mixed,mixed,mixed) r0.x, l(0), l(0), u0.xxxx
1: store_structured u1.x, vThreadID.x, l(4), r0.x
2: ret
Also, summoning the gpu god from the Neverending Story realm @bgolus


