I’m trying to wrap my head around ComputeShaders but still have unexpected stuff happening.
So, I’m trying to use a ComputeShader to perform an operation that sadly isn’t very easily parallelized as it requires carrying out a sum over each pixel of a Texture2D. My solution was to create a kernel with
(SV_GroupIndex being the 1D equivalent of SV_GroupThreadID), then pass a 64 float ComputeBuffer as partial_sum and then sum the 64 partial sums inside my regular code with a for loop. With this test code I expected to get a result equal to the size of my texture. But turns out the number is also fairly smaller, and I can’t figure out why. Am I wrong in understanding how this should work?