For the last year, we’ve been encountering a huge performance problem rendering with Vulkan on the Quest 2/3, compared to OpenGL ES. This seems to have been the case for a few years. Previously we’ve accepted that Vulkan has poor performance and simply used OpenGL instead, but we would like to use certain Vulkan-specific features such as Application SpaceWarp, so this has become a major problem.
Like many mobile devices, the Quest 2 and 3 use a tile-based deferred renderer, in which the geometry is divided across a number of small bins covering limited areas of the screen. This makes profiling difficult, since it is not possible to time individual draw calls. However, it is possible to profile the GPU using Perfetto, which allows us to see the time taken by operations such as rendering and blitting.
As far as I can tell from profiling, the cause of the performance problem is that on Vulkan, the colour, depth and stencil buffers are copied back multiple times after rendering each bin, while in OpenGL ES, only the colour is copied and only once.
Here are some typical examples. First we have a trace in OpenGL.
Rendering a bin takes around 100 microseconds on average. After rendering, the colour must be copied back to main memory; this takes about 2-3 microseconds on OpenGL. Only one store operation is carried out.
This trace shows the same scene rendered with Vulkan. While the render time is about the same in both cases (sometimes faster or slower depending on what’s in view), on Vulkan we see a series of store operations, alternating StoreColor and StoreDepthStencil, with eight carried out on every bin. These operations can take up to 15 microseconds, plus there is time between each one. This can easily add up to as many microseconds as the actual rendering!
Summed over every bin, this adds up to about 6+ ms of additional time on Vulkan - which as you can imagine is devastating for Vulkan performance, when we only have 14ms in total to render a frame. This is reflected in the framerate - on OpenGLES we are usually able to hit our 72fps target, on Vulkan we frequently drop below 60fps.
I have tried various combinations of settings, such as switching between Forward+ and Forward, and changing the depth copy to ‘Force Prepass’ instead of ‘After Opaques’. We are not using the depth or stencil buffers as textures in any subsequent pass, so there is no obvious reason why they should be copied, especially as OpenGL is able to render an identical frame without the copy.
I also tried disabling MSAA. We are required to use 4x MSAA to release on the Meta store, so we need to find a way to make MSAA work, but this proved an informative experiment. Without MSAA, both OpenGL and Vulkan use fewer bins (16-18 instead of 63-66), and the difference between them is much smaller (only 1-2ms). Without MSAA, the Vulkan trace has four rather than eight copy operations per bin:
Although without MSAA, the impact is smaller, the issue is still there. Vulkan takes about 2ms more to render the same scene. I believe the overhead likely scales with the number of bins, and MSAA causes the number of bins to increase, so the problem becomes more noticeable.
I found an old thread from 2021 mentioning Vulkan performance problems with Depth Priming and MSAA. I believe this is likely to be the same issue, but it doesn’t look like anyone has investigated in depth since then.
We have encountered this problem in Unity 2022 and Unity 6, but it is likely to affect other versions of Unity as well. According to the above thread, the problem was likely introduced in 2021, ironically in an attempt to improve mobile performance.
Has anyone else encountered this problem, and is there a workaround which can get Vulkan performance to parity with OpenGLES3 without sacrificing MSAA? Right now, the cost of switching to Vulkan would destroy most of the performance we would hope to gain with Application SpaceWarp.