Hi guys,
I am trying to implement some algorithms in my project. All the kernels execute operations in parallel, but the last one needs to be done in a serial way. I use just one thread to iterate over quite a big array (around 50K elements), in the exact order as the data has been sorted before. Here is the code of the problematic kernel:
[numthreads(1, 1, 1)]
void CsSelectRates(uint3 id : SV_DispatchThreadID)
{
uint b[7] = { 1,2,2,4,8,8,16 };
float Q = 0;
int r = 0;
int c = B - N;
int tileIndex;
int stateIndex;
int selectionStateIndex;
float currentQuality;
float selectionQuality;
int currentBandwith;
int selectionBandwith;
uint selectedState;
for (int i = 0; i < N; i++)
{
Q += qualityBuffer[i].quality[0];
}
for (int j = 0; j < N; j++)
{
selectionBuffer[j] = 0;
}
for (int r = 0; r < lastElement; r++)
{
tileIndex = (int)finalRatiosBuffer[r].y;
stateIndex = (int)finalRatiosBuffer[r].z;
selectionStateIndex = selectionBuffer[tileIndex];
currentQuality = qualityBuffer[tileIndex].quality[stateIndex];
selectionQuality = qualityBuffer[tileIndex].quality[selectionStateIndex];
currentBandwith = b[stateIndex];
selectionBandwith = b[selectionStateIndex];
selectedState = selectionStateIndex;
if (currentQuality > selectionQuality && c - currentBandwith + selectionBandwith > 0)
{
c = c - currentBandwith + selectionBandwith;
Q = Q + currentQuality - selectionQuality;
selectedState = stateIndex;
}
selectionBuffer[tileIndex] = selectedState;
}
}
the for loop doesn’t seem to be a problem here until I do all the operations within. If I comment out the line
selectedState = stateIndex in the for loop, the whole thing process the data very fast giving me 5000 FPS on my pc. If I uncomment this line the performance drops to 150 FPS.
Another interesting thing is that if I reduce the loop to process only 10000 Elements it still gives 5000 FPS, but, with 2000o elements it drops to 1300 FPS. What is happening here? Why the performance drop is not linear?
Is there any way to execute the kernel without a performance drop?
I attach the c# script and the compute shader code.
6612433–752545–Assets.zip (1.48 KB)