You can see an example in the attached image. There are lots of things in the generated shader like unnecessary bit shifts as well as calls to “bitFieldInsert”. Why does this even happen in the first place? According to Xcode profiler, just removing these (they also contribute to the “integer ALU” section) would improve performance by ~20% assuming only half of the integer ALU goes away. That’s a pretty good savings for the exact same functionality. Some of these shifts appear to be just divides that get converted to shifts because the compiler detects they are a POT, but some seem to be unnecessary.
Also generated shaders seem to be using fma instead of matrix multiply functions, which I am not sure, but I would guess would be slower (hardware matrix multiply maybe, not the best evidence, I know).
So why is the generated shader code so poor and how can it be fixed? Any unity shader/rendering team experts here?