ComputeShaders and SV_GroupIndex

Hi all,

I’m trying to wrap my head around ComputeShaders but still have unexpected stuff happening.
So, I’m trying to use a ComputeShader to perform an operation that sadly isn’t very easily parallelized as it requires carrying out a sum over each pixel of a Texture2D. My solution was to create a kernel with

[numthreads(8,8,1)]
void CSFilling (uint3 id : SV_DispatchThreadID, uint group_i: SV_GroupIndex)
{
partial_sum[group_i] += 1;
}

(SV_GroupIndex being the 1D equivalent of SV_GroupThreadID), then pass a 64 float ComputeBuffer as partial_sum and then sum the 64 partial sums inside my regular code with a for loop. With this test code I expected to get a result equal to the size of my texture. But turns out the number is also fairly smaller, and I can’t figure out why. Am I wrong in understanding how this should work?

https://scrawkblog.com/category/directcompute/ this is very good explain~