I am profiling my rendering on Quest 2 using OVRGPUProfiler to see wher eto optimize, and what I am seeing is not consistent with all the advice and theory I read here and there.
I think I have a fairly deep knowledge on how GPU works and how to optimize the hell out of Unity scenes.
This thread is intended to start a conversation to better understand how that Quest2 GPU works, and is not a simple direct question, so you can skip if you are not interested in getting deep into optimization techniques.
To summarize, what I am looking for is probably some online resource/documentation that explains that GPU in details.
As far as I understand, the snapdragon XR2 GPU used on Quest is a tiled based GPU. It is supposed to take advantage of triangle binning to remove overdraw as much as possible. Advices given on several Oculus talks say that a Z Prepass is useless and actually even counter productive, since the GPU will perform per tile triangle sorting in order to execute as few pixel shaders as possible. This is also the reason why you should avoid alpha test as much as possible as it defeats the sorting magic that minimizes pixel shading.
Anyway, I created some simple test cases to see how that works in practice. My test scene is extremely simple: it uses a single material and draws quads roughly coreving the field of view. I have set it up so that I can simulate overdraw by having a given number of quads in the same mesh, with a slight z offset, all packed in a single mesh, ordered from back to front.
So, with a single quad, I have zero overdraw, but with 8 quads, I would be theoretically invoking pixel shaders 8x the screen pixel count on a desktop GPU. If the tiled based GPU does what it says itās good at, I would be expecting it to sort my polygons properly and end up optimizing out the overdraw and deliver performance pretty close to my single quad test.
However, this is definitely NOT what OVRGPUProfiler says, as I see my GPU usage jumping from around 55 to over 70 when switching from my single quad to 8 quads. That performance loss canāt be caused by vertex shaders, which are very simple in my case compared to pixer shaders (I used the Unity standard shader with albedo, normal map, metallic, detail map, emission, everything I could use to ensure that I end up being pixel bound on the GPU).
So, what Iām getting at is this:
- mobile GPUs āadvertizeā their efficiency at handling overdraw to the point that they almost tell you not to worry about it too much
- my tests show that I get a clear performance gain when I DO worry about overdraw
Does someone have an explanation for that apparent paradox ?
Along the same lines, my tests on more complex scenes show that a well implemented z-prepass (in a single drawcall) is often beneficial when using āheavyā (think PBR) pixel shaders on mobile VR. In the end, āreal-worldā profiling is what I trust, but Iād still like to get an understanding of what is going on behind the scene, if someone can point me to good resources on the topic.
Finally, I read that if using alpha test, I should make sure that I draw that geometry AFTER opaque. Fine. Thatās what Unity does by default anyway (with renderQueue usually set to 2450 when opaque is 2000). But hell, if the GPU does some magic for handling all drawcalls at once for a given tile, where does the renderqueue idea fit into that ?
Ok, long post, but I figured some of you developers would be interested in discussing this so we can all better understand what weāre doing when optimizing our VR graphicsā¦