Was reading an interesting article regarding Loop Alignment in .Net 6. This seemed like a fascinating optimization topic, and I thought of Burst right away! With the addition of new Burst hints like Likely/Unlikely, might it make sense for additional loop alignment hints? Is it possible that Burst/LLVM takes care of this already? I don’t think I’ve ever noticed NOP padding in generated assembly.
Could imagine syntax like this potentially
void Execute() {
//stuff
Burst.CompilerServices.Hint.AlignLoop();
for(int i=0; i<array.Length; i++) {
//loop
}
}
2 Likes
I read generated X86 a lot when working with Burst, also often containing small loops.
When reaching such a loop, I often see this:
.p2align 4, 0x90
.LBB0_10:
vpaddb ymm6, ymm6, ymmword ptr [rax + rdx]
vpaddb ymm5, ymm5, ymmword ptr [rax + rdx + 32]
vpaddb ymm4, ymm4, ymmword ptr [rax + rdx + 64]
vpaddb ymm3, ymm3, ymmword ptr [rax + rdx + 96]
sub rdx, -128
cmp edx, 32640
jne .LBB0_10
Your post interests me but I was not aware of such an optimization, although it makes a lot of sense looking back. Now I googled the .p2align and the first result at https://stackoverflow.com/questions/21546946/what-does-p2align-do-in-asm-code suggests that LLVM actually performs that optimization already. Hope it helps 
3 Likes
Ah very cool! I did not know about the .p2align
directive, sounds like this is indeed already working just as I had hoped. Thanks for the info!
That really was a great article though - so thanks for sharing! We’ve taken a note of it incase there is anything more we can do (like the hint you suggested) to make LLVM optimize the code even better.
4 Likes