"Abnormal efficiency" groupshared in compute shader

I need to use groupshared memory in the compute shader, but I’ve found that using a 16*16-sized array consumes much more performance than using a direct 256-sized array.
Unity-2022-3-LTS.7f1 in URP.

My SSR_RESOLVE compute shader is similar to FidelityFX-SDK. But 2658μs for groupshared [16][16], 512μs groupshared [256] in unity, and 545μs for ffx_prefilter groupshared [16][16] in ffx SDK.

1 Like

When using groupshared memory in Unity’s compute shaders, it’s crucial to choose the right array structure for optimal performance. A single-dimensional, 256-sized array is typically more efficient and faster for the GPU to process than a two-dimensional, 16x16 array. This efficiency stems from how the GPU accesses the data. In Unity, especially within the Universal Render Pipeline, prioritizing efficient data access is key to enhancing performance.

The differences in performance you’ve noticed, including when compared to the FidelityFX SDK, highlight the impact of data structure choices on shader execution times. For the best performance in Unity, a linear array is generally recommended. However, the final decision should also consider the specific needs of your project, balancing performance gains with practicality and the intended use of your shader.

@andrew-lukasik Hello, may I ask who can help me take a look at this issue?

Is there any progress now?