Vulkan performing much worse than OpenGL ES due to excessive buffer copies on Quest 2/3

bryn-Holonautic · November 29, 2024, 1:14pm

For the last year, we’ve been encountering a huge performance problem rendering with Vulkan on the Quest 2/3, compared to OpenGL ES. This seems to have been the case for a few years. Previously we’ve accepted that Vulkan has poor performance and simply used OpenGL instead, but we would like to use certain Vulkan-specific features such as Application SpaceWarp, so this has become a major problem.

Like many mobile devices, the Quest 2 and 3 use a tile-based deferred renderer, in which the geometry is divided across a number of small bins covering limited areas of the screen. This makes profiling difficult, since it is not possible to time individual draw calls. However, it is possible to profile the GPU using Perfetto, which allows us to see the time taken by operations such as rendering and blitting.

As far as I can tell from profiling, the cause of the performance problem is that on Vulkan, the colour, depth and stencil buffers are copied back multiple times after rendering each bin, while in OpenGL ES, only the colour is copied and only once.

Here are some typical examples. First we have a trace in OpenGL.

Rendering a bin takes around 100 microseconds on average. After rendering, the colour must be copied back to main memory; this takes about 2-3 microseconds on OpenGL. Only one store operation is carried out.

This trace shows the same scene rendered with Vulkan. While the render time is about the same in both cases (sometimes faster or slower depending on what’s in view), on Vulkan we see a series of store operations, alternating StoreColor and StoreDepthStencil, with eight carried out on every bin. These operations can take up to 15 microseconds, plus there is time between each one. This can easily add up to as many microseconds as the actual rendering!

Summed over every bin, this adds up to about 6+ ms of additional time on Vulkan - which as you can imagine is devastating for Vulkan performance, when we only have 14ms in total to render a frame. This is reflected in the framerate - on OpenGLES we are usually able to hit our 72fps target, on Vulkan we frequently drop below 60fps.

I have tried various combinations of settings, such as switching between Forward+ and Forward, and changing the depth copy to ‘Force Prepass’ instead of ‘After Opaques’. We are not using the depth or stencil buffers as textures in any subsequent pass, so there is no obvious reason why they should be copied, especially as OpenGL is able to render an identical frame without the copy.

I also tried disabling MSAA. We are required to use 4x MSAA to release on the Meta store, so we need to find a way to make MSAA work, but this proved an informative experiment. Without MSAA, both OpenGL and Vulkan use fewer bins (16-18 instead of 63-66), and the difference between them is much smaller (only 1-2ms). Without MSAA, the Vulkan trace has four rather than eight copy operations per bin:

Although without MSAA, the impact is smaller, the issue is still there. Vulkan takes about 2ms more to render the same scene. I believe the overhead likely scales with the number of bins, and MSAA causes the number of bins to increase, so the problem becomes more noticeable.

I found an old thread from 2021 mentioning Vulkan performance problems with Depth Priming and MSAA. I believe this is likely to be the same issue, but it doesn’t look like anyone has investigated in depth since then.

We have encountered this problem in Unity 2022 and Unity 6, but it is likely to affect other versions of Unity as well. According to the above thread, the problem was likely introduced in 2021, ironically in an attempt to improve mobile performance.

Has anyone else encountered this problem, and is there a workaround which can get Vulkan performance to parity with OpenGLES3 without sacrificing MSAA? Right now, the cost of switching to Vulkan would destroy most of the performance we would hope to gain with Application SpaceWarp.

DevDunk · November 29, 2024, 10:57pm

Nice deep dive!
Can you share your exact URP settings and URP Asset? Some combinations of the depth settings can cause and fix issues. After Transparent and Force Prepass could be interesting options.
Also you can disable depth and stencil in the Player settings, but this might break stuff

Also if this is a Unity issue, you can file a bug report via Help → Report a bug. This way Unity can also look into it and hopefully fix it

bryn-Holonautic · December 3, 2024, 7:52am

Hi DevDunk! Here is the URP asset:

and the renderer asset:

and the rendering settings in the Player for OpenGL and Vulkan:

As you can see, we’ve turned off nearly everything except MSAA. There are also no reflection probes in the scene and we are not rendering shadows. Note that while these screenshots show ‘Forward+’ and ‘After Transparent’, nothing really changed when I tried switching to Forward, After Opaques or Force Prepass.

In case it’s relevant, our game is primarily rendered using Entities Graphics. Tomorrow I will make a test scene to see if the issue persists without using Entities.

Filing a Unity bug is a good idea. A previous report of what appears to be the same issue was marked ‘won’t fix’, but perhaps with the additional profiling data, Unity will be able to do more with the report.

DevDunk · December 3, 2024, 10:20am

Forward+ is quite a bit slower than Forward when not using 4+ additional lights.
Maybe use force Prepass as depth priming (or depth mode, I keep forgetting). Maybe after Opaque with Force Prepass is even faster. (Just read that you tried it already, then ignore this)

Use only either Vulkan or OpenGLES3. Remove the other when testing for performance to avoid any conflicts or build issues

And yeah new reports might point to a new issue. How the issue tracker currently works is that it points to one issue, even if there would be 3 performance issues in the sample project

And note I haven’t worked with ECS much, so there could be something there

CoalCar-Shane · December 5, 2024, 7:01am

We’ve also noticed a fairly major performance degradation when switching from OpenGL to Vulkan for Quest 2 and 3 with fairly similar URP settings to what was listed above. We aren’t using ECS, so that doesn’t seem to be a factor if we’re both experiencing issues.

ammars26 · December 8, 2024, 6:22am

Facing same issue, no matter which settings I apply Vulkan performs horrible as compared on OpenGLES on Quest. No idea why this isn’t addressed yet or why there no specific guidelines on how to optimize Vulkan on Quest

DevDunk · December 8, 2024, 7:49am

If you want it to be fixed, file a bug report

ThomasZeng · December 9, 2024, 5:18pm

Thanks for the thread everyone!
We want to optimize the Vulkan path for Quest platform and make it the best API for untethered XR. In some scenarios we measured, it is already performing way better than GLES.
We authored this optimization guideline for untethered XR platforms this year here. If you haven’t checked it out, I would highly recommend: Unity - Manual: Optimize for untethered XR devices in URP

Based on our internal testing, the vulkan performance is on par with GLES. So the performance gap reported in this thread is very interesting. Vulkan as an explicit Gfx API is more configurable and it requires higher level renderer code to set it up optimally. So it is really hard to pinpoint where the issue is without diving deep. A renderdoc capture is very helpful here to dissect the frame.
As folks already mentioned under the thread - it is worth while to report an issue with minimal repro project(+apk if possible) for us to investigate. Please note, if this is an issue introduced in custom code, we won’t be able to fix.

From glancing the traces(thanks for the screenshots, they are really helpful and contain good info), it looks like a store operation related issue because you mentioned vulkan stores 8x more surfaces. In U6 RenderGraph, you can leverage the render graph viewer to estimate the usage pattern of your frame resources, and gain a good understanding about which resources actually need to be stored. You want to keep the color/depth store to minimum especially if they are screen space textures such as eye texture/scene color/scene depth etc… If you use RG API to tell the system which resource is used for what, RG system should do the heavy lifting to optimize the load/store operation for you. For example, if the depth/stencil data is never used after the opaque draw, then they should not be stored at all.

Thanks all,
Thomas

DevDunk · December 9, 2024, 5:27pm

Great reply!
And that page is new for me. I have quite some more performance settings for XR and it would be nice to see some project/URP settings with good base setups in the docs. Would that be possible?
If needed I can share them as well after I make a project setup video

bryn-Holonautic · December 9, 2024, 11:45pm

Hi Thomas,
Thank you very much for the reply! I will try to provide a bug report soon, but first let me see if I can find a way to tweak the Render Graph as you suggested.

I had a read through the manual page linked. I believe we are already doing nearly everything listed on that page. The only exception is having Forward+ instead of Forward.

Regarding Forward rendering - switching from Forward+ to Forward results in a vague warning from the Entities Graphics package, and since it didn’t seem to make a difference to performance, I left it on Forward+. However, since there doesn’t actually seem to be a problem with setting Forward despite the warning, I will do future tests with Forward. (In practice, we are only using the main light, and exclusively custom shaders, so we are effectively not doing anything forward+ anyway).

I will see if I can tweak the render graph settings and get back to you.

ammars26 · December 10, 2024, 3:39pm

Hi Thomas,

Thanks for sharing the optimization guidelines with us.

CoalCar-Shane · December 11, 2024, 10:39pm

We have made a minimal repro project with apk builds for both vulkan and opengl. We have submitted it as a bug report:
https://unity3d.atlassian.net/servicedesk/customer/portal/2/IN-90946

In the sample scene, the performance of opengl is bad but consistent and vulkan performance is mostly worse than opengl but sometimes better for no clear reason.

Our findings were also that performance with Vulkan varied with based on if it was a clean install or running it a second time.

CoalCar-Shane · February 27, 2025, 12:10am

Curious if there’s any updates on this:

ammars26 · March 6, 2025, 5:26am

I’m also waiting for an update on this

Boodums · April 24, 2025, 6:04pm

Thank you for replying to the thread. Do you have an optimization guide for the built-in pipeline. I’m using the built-in pipeline because my project runs slower on URP.

I’m seeing a performance hit on the Quest 3 with Unity 2022.3.61.f1 using Vulkan and the Built-In pipeline. My GPU usage is 25% using OpenGLS3 and 44% using Vulkan, a 76% increase.

Arycama · December 19, 2025, 4:29am

Seems to be an issue with vulkan regardless of rendering pipeline. We’re writing our own custom pipeline but there is always an additional render to system memory when using vulkan.

These are screenshots from Meta’s RenderDoc fork showing the tile timeline.

When using GLES3, rendering is all done into a single target:

When using Vulkan, there is an unexplained render to system memory every frame adding ~1.5ms of extra frame time. With no code changes, this disappears when switching to GLES3.

I’ve tried various player settings such as setting BlitType to never, but can’t figure out what is making this happen. I am guessing it’s probably similar for URP and BiRP.

Our final tonemap pass renders directly to the xr render target, and this causes no additional copies on GLES, so it seems like something behind the scenes when vulkan is enabled causes this. (MSAA is also disabled)

mashixuan8 · December 29, 2025, 12:13am

Can confirm this is still the case even for latest 6.3 version. Can a Staff member investigate? @ThomasZeng

Arycama · January 8, 2026, 5:05pm

I have been experimenting with writing a very minimal VR SRP, and have managed to avoid an extra blit, somehow. I need to use HDR, so I still have an intermediate texture, (RGB111110f) but this is then blit to the final system render target directly, (Which also applies tonemapping) and avoids a 3rd blit occuring. (Using vulkan)

I’m trying to narrow down why the extra blit occurs in a more complex setup. I’ll post here if I figure anything further out.

Feel free to check out the repo if you are interested, it simply gets the XR target if available (The API for it is rather.. ugly though), renders some cubes, and a skybox (Using a custom fullscreen shader) after setting some matrices etc. I’ve avoided as much blackbox unity code as possible. GitHub - arycama/UnityMinimalVrSrp: Minimal VR Scriptable Render Pipeline for Unity

DevDunk · January 8, 2026, 7:35pm

Take a look at the latest on-tile XR rendering features. That should avoid it and maybe you can copy it

Arycama · January 12, 2026, 3:42am

The tricky part is I have no idea what is actually being rendered into this additional render target. It only appears inside the Tile Timeline section of renderdoc, but I can not see any render commands that cause this to render. There are a few reported instances of the same problem on the Meta forums.

My guess is that there is some flag or rendering command or something which is causing the final tonempaping/post process blit to goto a seperate render target instead of a direct render to system memory, and then a final render to system memory occurs in the background. My guess is that URP has the same problem. Will continue investigating this week.

Topic		Replies	Views
Horrible performance on Vulkan with simple scene (Quest 2) Unity Engine XR , Meta-Quest , Performance , Question	8	3723	November 17, 2022
Vulkan Renderer on Quest 2 (Android) is unusable Unity Engine com_unity_entities_graphics , Feedback	3	1151	February 17, 2023
Vulkan Builds won't run on Quest 2 Unity Engine XR , Meta-Quest , Performance , Question	66	11202	May 5, 2023
Serious Performance Regression of Using Vulkan vs. OpenGL in Unity 2021.3.8 LTS Unity Engine XR , Meta-Quest , Performance , Bug	3	2727	August 29, 2022
[Oculus quest 2] Vulkan API or OPEN GL 3 Unity Engine XR , Meta-Quest , Conversation	6	8685	June 23, 2024

Vulkan performing much worse than OpenGL ES due to excessive buffer copies on Quest 2/3

Related topics