Hi, I am calculating bounds for a mesh on GPU, and during profiling I found out that this is bottleneck of my GPU operations, probably because of the atomic writes. Is there any way to speed up this operation? Here’s my very simple compute shader code:
void CreateBinMinMax(uint3 id : SV_DispatchThreadID) {
if (id.x >= numParticles) return;
float3 position = positions[id.x].xyz;
InterlockedMin(minMaxCoords[0].minX, asuint(position.x));
InterlockedMax(minMaxCoords[0].maxX, asuint(position.x));
InterlockedMin(minMaxCoords[0].minY, asuint(position.y));
InterlockedMax(minMaxCoords[0].maxY, asuint(position.y));
InterlockedMin(minMaxCoords[0].minZ, asuint(position.z));
InterlockedMax(minMaxCoords[0].maxZ, asuint(position.z));
}
Already tried many things, like playing around with thread size and stuff, to no avail. Any help would be much appreciated!
Thanks, I found another solution, which is a bit more complicated, but mitigates the issue with groupshared varaibles, which seems to perform way better than RWStructuredBuffer for this task:
All variables with “_local” are groupshared, and only once per group we need to interlock the RWStructuredBuffer, which greatly seems to reduce the overhead.