I’m not particularly experienced with low level optimisation, but I’ve been reading a bit about it recently. One thing I’ve come across quite often is Profile Guided Optimisation, which is a feature offered by compilers like GCC and Clang. The idea is that it runs the compiled code with instrumentation to gather profiling information about how the application behaves under realistic data loads, then subsequent compilations can use that profiling data to further optimise their compilation by better predicting things like branching patterns, by replacing heuristic guesses with recorded runtime data.
This sounds like it could be a very good fit for DOTS and Burst, where performance is likely to be dependent on patterns in our data which the compiler cannot currently know about. Has any consideration been given to implementing this in Unity? I’m imagining a scenario in which we can play scenes with instrumentation enabled and Burst can optimise its compilation for the data patterns in our game.
I recently made another feedback thread suggesting an intrinsic to give branch prediction hints , and @sheredom responded positively, saying that he’d seen seen good behaviour from internal hinting around exceptions. One of the major advantages of Profile Guided Optimisation is that it’s able to infer these hints without the developer needing to provide them. If that type of hinting does indeed prove valuable, then PGO is likely to give a significant performance advantage.
So PGO is something I have lots of opinions on - and I’ll just TL;DR it upfront by stating that I think PGO isn’t great.
So to give a big more flavour to my clickbait answer - there are so many little issues with PGO that its hard to break it down but:
It relies on you (or any user) always providing perfect playability of your ‘average’ playthrough of your game, so that the right profiles are being generated.
If you provided a bad scene to profile you could completely skew the generated code.
It then requires the compiler to have a way to feed that information back in in a standard way. This isn’t as easy as it sounds - because generally the more you optimize the original code the less you understand where exactly the code was in the original codebase.
From a compiler verification approach PGO is utterly terrifying. PGO basically means that based on a random profile from a user (lets say you have a scene that you profile that causes the compiler to hit a specific branch a ton more) could massively change the code that the compiler produces from the same outputs. From the perspective of a compiler developer this is terrifying - because users could randomly hit on strange and subtle codegen bugs that are very difficult for us to have upfront confidence and testing to cover.
PGO isn’t a magic bullet either - the main benefit from it is about expectation of branches taken/not-taken (like your suggestion for having an intrinsic for that), but you really need to have a lot of other things pretty optimal before you’d see a significant gain here. I think firefox on Linux with PGO got around ~2-5% performance gain which isn’t bad by any means - but they’ve already spent a ton of time optimizing their codebase before hand.
I’m not saying we’d never look at it - but I personally think there are a lot of bigger fish to fry before we get there
I was already thinking about complicated things like doing separate profilings for each scene and then compiling different versions, so I realise the complexity and overhead may be great. I’d never thought of the verification issue, that’s very interesting.
That’s all completely fair enough! You know a lot more about this than I do and I’m happy to know it’s been considered and rejected (at least for now) for good reason. Thanks for taking the time to write such a detailed reply!