Performance impacts of loops on compute shaders?

I am new to compute shaders (and shaders in general). I know that conditional branches lead to less predictability in your shader, causing degraded performance. I was wondering if for-loops have a similar performance impact, or if they are a performant option since they do loop a predetermined number of times.

If it’s a hard-coded amount of times, then usually the loop will just be unrolled and so the cost is static as well. If it’s a varying amount of loops, then it can’t unroll the loop and this can have some extra cost doing it dynamically, but the biggest performance issue can be the variance in loop counts for a given thread in the warp (thread group), because the whole warp (which usually will be 32 threads on Nvidia hardware for optimal saturation) has to wait for each thread in it to finish before they can all be freed up for another set of data to work on. So if there’s a lot of per-thread variance in your loop counts that can result in a lot of held up resources.