Overdraw on Quest 2 tiled GPU: my findings are not consistent with the theory...

I am profiling my rendering on Quest 2 using OVRGPUProfiler to see wher eto optimize, and what I am seeing is not consistent with all the advice and theory I read here and there.

I think I have a fairly deep knowledge on how GPU works and how to optimize the hell out of Unity scenes.

This thread is intended to start a conversation to better understand how that Quest2 GPU works, and is not a simple direct question, so you can skip if you are not interested in getting deep into optimization techniques.

To summarize, what I am looking for is probably some online resource/documentation that explains that GPU in details.

As far as I understand, the snapdragon XR2 GPU used on Quest is a tiled based GPU. It is supposed to take advantage of triangle binning to remove overdraw as much as possible. Advices given on several Oculus talks say that a Z Prepass is useless and actually even counter productive, since the GPU will perform per tile triangle sorting in order to execute as few pixel shaders as possible. This is also the reason why you should avoid alpha test as much as possible as it defeats the sorting magic that minimizes pixel shading.

Anyway, I created some simple test cases to see how that works in practice. My test scene is extremely simple: it uses a single material and draws quads roughly coreving the field of view. I have set it up so that I can simulate overdraw by having a given number of quads in the same mesh, with a slight z offset, all packed in a single mesh, ordered from back to front.
So, with a single quad, I have zero overdraw, but with 8 quads, I would be theoretically invoking pixel shaders 8x the screen pixel count on a desktop GPU. If the tiled based GPU does what it says it’s good at, I would be expecting it to sort my polygons properly and end up optimizing out the overdraw and deliver performance pretty close to my single quad test.

However, this is definitely NOT what OVRGPUProfiler says, as I see my GPU usage jumping from around 55 to over 70 when switching from my single quad to 8 quads. That performance loss can’t be caused by vertex shaders, which are very simple in my case compared to pixer shaders (I used the Unity standard shader with albedo, normal map, metallic, detail map, emission, everything I could use to ensure that I end up being pixel bound on the GPU).

So, what I’m getting at is this:

  • mobile GPUs ā€œadvertizeā€ their efficiency at handling overdraw to the point that they almost tell you not to worry about it too much
  • my tests show that I get a clear performance gain when I DO worry about overdraw

Does someone have an explanation for that apparent paradox ?

Along the same lines, my tests on more complex scenes show that a well implemented z-prepass (in a single drawcall) is often beneficial when using ā€œheavyā€ (think PBR) pixel shaders on mobile VR. In the end, ā€œreal-worldā€ profiling is what I trust, but I’d still like to get an understanding of what is going on behind the scene, if someone can point me to good resources on the topic.

Finally, I read that if using alpha test, I should make sure that I draw that geometry AFTER opaque. Fine. That’s what Unity does by default anyway (with renderQueue usually set to 2450 when opaque is 2000). But hell, if the GPU does some magic for handling all drawcalls at once for a given tile, where does the renderqueue idea fit into that ?

Ok, long post, but I figured some of you developers would be interested in discussing this so we can all better understand what we’re doing when optimizing our VR graphics…

3 Likes

I asked Aras on Twitter and it seems that I missed the fact that the snapdragon Adreno GPU is a tiled GPU but not tiled deferred. So overdraw DOES have an impact on such GPU. This answers part of my questions. I still get a feeling that the oculus folks are a bit misleading when they say not to perform a Z prepass → you surely don’t want to issue a drawcall for every object for a prepass as the CPU cost is heavy but in my case I can build a specific mesh for a prepass and handle it in a single drawcall.

This leaves one question to be answered: why is alpha test not recommended on Quest 2 ? I will dig into that a little more.

4 Likes

So even though the thread is pretty old, maybe it can be revived ? I always wished to have some community investigation into mobile VR performance characteristics.

From my limited testing and real project optimization, i can defintiely say that the chip does early z-reject. Though i never tried using a depth prepass (maybe i will in the future), i usually try to control the renderorder by using the Renderqueue of the materials. So big background objects which i know will be in the background will be drawn later and vice versa, and i’ve seen really big performance improvements from doing that.

Also from my tests using opaque and alpha clip in overdraw situations will also perform better than using transparent and additive / alpha. Usually alpha test is not recommended because it can break optimizations like z-reject, but from my experienc it’s still much better than just doing the overdraw.

I am actually in the process of writing my master thesis about shader performance on the Adreno 650 GPU (Quest 2), so in a few months i might have some deeper insights to share hopefully ^^.

I dream about the community coming together to make a benchmarking website, where we can collect detailed stats about rendering performance for different devices. I am thinking about really granular tests as well, like testing where the limit of parallel texture sampling is for each gpu or how well a gpu does complex instructions compared to simple ones.
Maybe some day someone will make that or i finally find the time to do this.

My depth prepass is efficient mostly because I can do it in a single draw call, not ā€œtoo manyā€ vertices, and my other shaders are quite complex (PBR with detail maps). I haven’t reaylle measured yet but it ā€˜feels’ like vertex shaders are more expensive on mobile than desktop

It would be even better, if meta could provide a useful indepth documentation so that we all don’t have to waste time doing our own benchmarks. most questions all developers raise should be easily able to answer by the engineers who actually develop the quest devices :,(

1 Like