Huge performance issue due to vSync (option that can't be disabled)

Hello there,

I have built my application to WebGL locally and connected it to Unity’s profiler.

When I am in fullscreen (4K external screen), I have huge performance issues, the framerate is very low.

Here is what I see in the profiler:

I tried to disable vSync (through code and via Quality settings in the menu) but I keep having the same issue (observing the same behaviour shown by the profiler).

When I show a Debug.Log of the vSync count settings, it displays a value of zero (disabled), which seems to be ignored.

I’m using a Mac with Google Chrome, Unity 2020.3.7f1.

Could someone help me in solving this performance issue?

Thanks in advance,
Jonathan

1 Like

Unfortunately web browsers do not enable rendering with vsync disabled. Browser compositing is always performed with vsync on. It would be possible to decouple rendering from vsync, but unlike in native 3D rendering, in browsers that would only cause dropped frames and stuttering.

I would assume here that the performance difference here is instead due to the increased number of pixels rendered on screen in fullscreen mode, compared to windowed mode. If you go in fullscreen mode on a display that has a smaller resolution than 4K, does that give a lesser impact on the issue?

2 Likes

Hi @jukka_j and thanks for your quick answser.
I initially answered right after your post the other day but I deleted my answser as I wanted to investigate a little more.

I think you are right and that vSync is not the issue. It’s working much better on smaller resolution or when I reduce the size of the screen.

Do you have any clue on how to deal with it? (because some users will have large screen or just Retina screen and I can’t control that, I want them to be able to play the game)
I tried reducing the window.devicePixelRatio value (in JS) or decrease
config.devicePixelRatio (initially to 2, settings it to 1), which improves performance but the UI (using UI Toolkit) is just ugly in this case.

I also tried reducing the render scale of the Universal Render Pipeline settings because it doesn’t affect the UI, but it doesn’t improve performance that much.

Do you know how to improve performance and having a lisible UI?

Thanks in advance for your help

First thing to double check is that the project is targeting WebGL 2 in the graphics settings. WebGL 1 will give bad performance due to the number of emulation aspects that URP shaders need to do there.

The next thing I would do is, since it is observed that the issue is dependent on rendered pixels, so a GPU fillrate/shader execution throughput issue, is to debug whether there might be some specific object or objects in the scene that are slow to render. Maybe there is one object with a large amount of geometry that is causing a large performance impact? Or maybe there is some specific shader that is causing slowness to render? Try to see if hiding objects uncovers a clue as to what is the cause of the slowdown.

If all objects seem like equally slow to render, try investigating if it is possible to reduce the number of objects to render, e.g. via batching. Unfortunately instancing is out of action from URP in WebGL builds, since that historically required OpenGL ES 3.1, but we have been working on getting it enabled also on OpenGL ES 3.0 level of hardware - i.e. on WebGL 2.

Even though the issue does suggest to be on the GPU performance side, it can also be useful to do a profile of the CPU side, using browser’s performance profiles. Maybe there are some hotspots there still that could help e.g. optimize some script execution, or similar.

If the project is public and you have a link, I can also give it a quick profile to see how it looks like on my system.

1 Like

Hi @jukka_j

First, thanks for your detailed answer and for proposing your help, that’s very kind of you!

Sure, the project is live at this URL: https://app.dev.dinogaia.com/ (you can show the frame rate by pressing SHIFT + F)
(you don’t need to login, you can see the performance issues right on the home screen)

Regarding your suggestions:

  • Yes I confirm that the target is WebGL 2.0 (with Linear Color Space Rendering)

  • I will try investigate whether hiding objects uncover some objects that would be very slow to render, but it’s hard to do so because I have to build to WebGL every time. You might wonder why I can’t test that in the Editor? Well, the performance in the Editor and Standalone builds are much better than in WebGL. Which is surprising because the doc states that performances should be equivalent (comparing Standalone and WebGL).

Moreover, I have seen that my WebGL build can also be slow on just simple loading screens (containing only UI based on UI Toolkit), so without any complex objects to render).

  • I didn’t try batching for the reasons above, there seems to be a performance issue specific to WebGL. Moreover, my scene doesn’t contain many objects (so not many batches) or animations so I concluded that this might not be the problem, below are the stats displayed in the Editor (for the home screen that you can see at the URL above):

7165333--858079--Screenshot 2021-05-22 at 10.57.39.png

In order to illustrate that the performance are much better on Standalone than on WebGL, you can compare the WebGL version: https://app.dev.dinogaia.com/ with the Standalone Mac App (https://static.dev.dinogaia.com/apps/dinogaia-macos.zip) (but I don’t know if you use Mac)

These two apps are two different targets of the exact same codebase.
To be honest, when displaying the Standalone Mac App full screen on a 4K screen, it’s also slow, but nothing compared to WebGL.

I will check the browser profiling (not used to browser profiling yet so will need to dig a bit). Do you see something particular on your side on https://app.dev.dinogaia.com/ ? (you can show the frame rate by pressing SHIFT + F)

Thanks again your help and support!

Surprisingly, most of the overhead isn’t due to Rendering but Scripting (see the screenshot attached).

However, the script functions being called might be rendering-related or something. I don’t have much experience in profiling WebGL applications so I might be wrong.

By the way, you can see that the screenshot is quite large, this gives an idea of the number of pixels that should be rendered fullscreen.

Thanks for the test case. Profiling that is a little bit tricky since that build is missing profiling frames for browser devtools.

Could you do a new build that would have the Emscripten linker flag “–profiling-funcs” active? That can be achieved by setting the field

PlayerSettings.WebGL.emscriptenArgs = "--profiling-funcs";

in a C# Editor script before doing the build. You can use the attached script buildWebGL.cs for example to get a dropdown menu that can achieve that. (see the documentation of the script at the beginning of the file)

That will give profilers more information about those wasm-function[…] blocks in the profile.

7169515–858988–buildWebGL.cs (6.79 KB)

3 Likes

Thanks jukka_j!

I’m actually using Unity Cloud Build to deploy this WebGL app so I’ve added the line within the PreExport method which was already set up in CloudBuild:

    public static void PreExport(UnityEngine.CloudBuild.BuildManifestObject manifest)
    {
        PlayerSettings.WebGL.emscriptenArgs = "--profiling-funcs";
    }

It’s currently being built. I will let you know once it’s deployed. And will dig into your example script as well as there might be other useful build options to be aware of in it. :slight_smile:

@jukka_j so the deployment is completed with the new option. To be honest, I am not fully sure how to check whether this option has been taken into account into the build because I didn’t try to dig into the function calls in the first place. Would you mind checking and tell me if you see a change? You may need to do a hard refresh.

I guess that what I need to do is to select a frame and check the functions taking the most % of time in the Bottom / Up tab, is that a right assumption?

Here is an example of what I can observe:

But again, I am not sure if there are all the details provided thanks to the profiling-funcs option. Although I can see that there are symbolized functions that give some clues.

What do you think?

Thanks in advance for your support, I very much appreciate it.

Thanks for the updated test case. I took it for a spin in Firefox, and recorded my impressions on a video. You can view it here: 2021-05-25 DinoGaia profile.mkv - Google Drive

The captured performance profile can be viewed here: Firefox Profiler

The following items came up:

  • loading Mecanim skeleton via .json seems to be slow. Does Mecanim support binary skeleton assets? If not, I would report that as a bug.
  • one of the slowdowns in JSON parsing is caused by a codegen issue in __atomic_fetch_add_8, that should be fixed in Unity 2021.2 Beta and newer, when that becomes available.
  • likewise, the performance penalty from populateUniformTable() should be optimized in Unity 2021.2 Beta and newer.
  • URP _Light2DLookupTexture_GetFalloffLookupTexture() function shows up hot in the profiles. If that shows up consistently on loading, definitely file a bug on that.
  • loading A8(?) textures incurs an unexpected performance slowdown that contributes to stuttering. If you are able to isolate this from a profile to a test case, certainly go ahead and report that as a performance bug.
  • using nonstandard FPS option in project settings is known to cause stuttering. Reverting to the default setting (I think it is value 0 by default) should improve animation smoothness

If you can get and share a similar profile from the slow system, maybe that could be compared to the fast one above to see if the shape of the profile is somehow fundamentally different.

4 Likes

@jukka_j First off, thanks a lot for taking the time to digging into all these details. I’ve learnt so much with your video. I can’t tell how much I appreciate your help and support!

Here is a profile I’ve made with Firefox: Firefox Profiler

It looks like unlike your profile, my frames take longer to render.

Moreover, I noticed something that takes significant time on my computer that doesn’t seem to take as much time on yours:

This “wait-related” function reminds me of some cryptic things I observed on Unity profiler when running in Play mode:


The performance is much better in the Editor but yet, it’s stuttering. What I can see is that much of the time is taken by this “Gfx.WaitForPresentOnGfxThread”. My understanding was that this meant that the CPU was waiting for the GPU, but I never found any clue to get rid of this. I concluded that this might be an incompatibility with my graphic card (my settings are below). I suspect that WaitForPresentOnGfxThread is related to what I see in the Web profiler (WebGL’s clientWaitSync function). What do you think?

By the way, at some point, I wanted to use Unity Profiler to profile GPU, but my graphic card isn’t compatible with Unity Profiler unfortunately.

Anyway, the game was stuttering a lot, much more than on your screen recording although I was not full screen. You can see how the WebGL app behaves on my computer:

Regarding this slow FPS I was mentioning, I can tell that the performance is much better on a Standalone Mac Build than on WebGL (on the same machine of course). On my iPhone this 100% smooth, I don’t even see any optimization needed on mobile.

My settings :
My graphic card is the following: Intel Iris Plus Graphics 1536 MB
Processor: 2,3 GHz Quad-Core Intel Core i7
My laptop is connected to a high-definition external screen.

To answer a few of your highlights/questions:

  • The skeleton mecanim related functions actually come from Spine Unity plugin. All the JSON animation files are formatted in Spine format and loaded at runtime. I will check whether I can have them loaded as binary to avoid all the overhead due to JSON decoding. This can have a huge impact on loading indeed, it’s a very good catch.

  • Ok, I completely understand why setTimeout can make the application stuttering. For the record, I initially changed the targeted frame rate because I noticed that Unity can go at high levels of frame rate (100, 150, 200, …) which I considered not necessary and resource-wasting. I wanted to set frame rate to something reasonable around 60. Is that the right thing to do for other platforms (not for WebGL)? I didn’t think it would make any difference on WebGL as I thought that frame rate would be conditioned by vSync. :eyes:
    Alright then, I will reset the targeted frame rate to the default value, at least for WebGL.

  • I will check the other highlights you’ve made regarding _Light2DLookupTexture_GetFalloffLookupTexture and A8 textures.

  • Regarding layout updates (that you’ve mentioned in your video), I’m a bit surprised as the app doesn’t do any update each frame. Maybe UI Toolkit does a refresh each frame though. I will have a deeper look on it as well.

Again, thanks a lot for your help!

That is a very interesting profile! It looks like glClientWaitSync() is indeed behaving very badly on that GPU. Is the computer some flavor of macOS? If so, can you specify the exact model? I wonder if our QA might have that exact same machine in their test bench.

One additional thing I’d love to see is whether Chrome has identical performance issue on that system compared to Firefox? That is, I wonder if glClientWaitSync() is just a Firefox issue here, or if Chrome has same general bad stuttering performance on this project? (I presume yes, since https://discussions.unity.com/t/840830 mentioned that you were testing on Chrome, but want to double check)

Does using an external display vs not using one affect the issue? If you just use the main display without an external display connected, does that avoid the long glClientWaitSync() issue in Firefox profile?

I think this occurs on other platforms if vsync was disabled. The best option for performance and conserving resources should be to enable vsync on all platforms, and keep the default setting to target the native display frame rate.

1 Like

Here is my exact Mac model (which I “customized” when ordering to Apple in order to have better settings, this is not the default config):
7176817--860395--Screenshot_2021-05-26_at_08_16_32.jpg
Storage:
7176817--860398--Screenshot 2021-05-26 at 08.18.53.jpg
Display settings:
7176817--860401--Screenshot 2021-05-26 at 08.19.29.jpg
I bought this computer last year (this is the last generation of Mac with “good” settings, I didn’t expect any performance issue).

Yes I usually use Chrome and I confirm that the behaviour is overall the same. I am not very familiar with its profiler so I am not sure if I can have a similar flame graph with as much details but overall the performance is the same. Here is the profile I exported from Chrome: https://drive.google.com/file/d/1JRw5ZW2mqHv4dP1xH9gLIdj3TMalEvIB/view?usp=sharing

Running on my laptop’s screen
Here is a profile of how it behaves on my laptop screen: https://share.firefox.dev/3oRo5tY
This is much better, we don’t see as long clientWaitSync calls. But it’s stuttering a little (even with FrameRequestCallback instead of setTimeout, I have switched back to the default target frame rate setting).

I think it’s stuttering because, for some frames, WaitForPresentOnGfxThread takes a lot longer. I noticed this in the Editor profiler. The game can be smooth overall, but some frames take longer because of WaitForPresentOnGfxThread taking a lot of time, at regular interval (spikes appear in the Unity Profiler, as shown on the screenshot in my previous post).

So even if it’s better on my laptop screen, I would expect better performance given that the “relative simplicity” of the scene being rendered. One last thing: even if it’s running better on my laptop’s screen, my computer can be very noisy (I guess that it might be enabling the so-called Turbo Boost or something when running the game).

How does it behaves on other Unity games?
I was curious to know if the problem was specific to my Unity project setup so I made tests on other Unity games. I made my tests in fullscreen:

I am surprised of such a difference.
Can there be a difference of config that makes such a difference on both games?

Ok, I will follow your advice and keep the default settings on all platforms then. Thanks for the tip.

Thanks for the detailed info.

I believe rather the difference is between the number of GameObjects rendered, and also possibly whether URP or built-in renderer is used - each GameObject needs populating a GPU buffer for rendering, glClientWaitSync() relates to that usage.

One super hacky thing you can try is to take a generated uncompressed release build, and find where it has something like

function _glClientWaitSync(sync, flags, timeoutLo, timeoutHi) {
    return GLctx.clientWaitSync(GL.syncs[sync], flags, convertI32PairToI53(timeoutLo, timeoutHi));
  },

in the framework.js file, and then replace its contents with a function

function _glClientWaitSync(sync, flags, timeoutLo, timeoutHi) {
    return 0x911C;
  },

(or alternatively return 0x911A)

That might have the effect of removing the GL waits altogether from the code, although whether that will help performance is a bit of a stretch.

In any case, I’ve added a task to investigate this on the appropriate hardware on our task board. Hopefully we can improve performance on the Intel GPUs here.

1 Like

Thanks for the hacky tip, I will try that and let you know how it goes :slight_smile:

Could you just tell me what this actually does?

I guess that if there’s a clientWaitSync it means that it’s necessary? What could be the potential side effects?

Thanks for adding the investigation on Intel GPU on your Task Board :slight_smile:

The change will stop the GL code from synchronizing the overwrite of GL buffers until they are actually freed up by the GPU. GPU drivers should be utilizing a shadow copy on writes to the buffers while they are still in the driver queue, but some GPU drivers don’t do that, but they block instead of shadow copying, causing worse performance instead.

3 Likes

Hi @jukka_j , I just wanted to add how valuable it is to see how you approach the profiling here for webgl, I super appreciate you doing this. There is not much information on profiling webgl for Unity plus the fact that you can’t connect the profiler remotely to webgl clients made it super hard for me to profile poor webgl performance (so I thought but you arn’t using the unity profiler anyway). I don’t want to hijack this thread with my own questions and will do my own tests first with what I learned here. Thanks a lot for that.

But maybe with the current efforts from Unity for webgl clients and even (maybe) officially supporting webgl for mobile in a foreseeable future please consider releasing more unity+webgl specific resources on optimization and profiling approaches like showcased here.

2 Likes

Hi @fxlange

No problem, I couldn’t agree more that the information given by @jukka_j in this thread are very insightful, not only for me but for other Unity developers so I am very glad that my questions could lead to answers that could helped you too! I can say that I didn’t find that kind of insightful information documented so I really appreciate that as well :slight_smile:

Hi @jukka_j

So I tried the hack and even if clientWaitSync disappeared from the profile, the performance is very similar with a bad frame rate (and my computer being quite noisy). Here is the profile: Firefox Profiler

The URL where I hosted this version (with the hack) is here: https://app.dev.dinogaia.com/test-without-client-wait/
You can see that I applied one of your suggestion on this file (hosted uncompressed): https://app.dev.dinogaia.com/test-without-client-wait/Build/build_webgl32.framework.js

Both of the options (0x911C or 0x911A) give similar performance result.
As a reminder, the regular version without the hack is here: https://app.dev.dinogaia.com/

I see in the Flame Graph that frame rendering is sometimes triggered upon PVsync::Msg_Notify and sometimes not, but I guess it’s normal because it always leads to a nsRefreshDriver:Tick.

I don’t find in the profil anything that can explain why it is so slow. Do you see something that could explain that?

From your experience, is it common that I experience a much better performance in a Standalone build? I am wondering whether it will be a good option to offer a WebGL version of my game (even if I am convinced that it is very powerful to offer players to play directly from their browser).

What I wanted to do to optimize performances was to reduce the devicePixelRatio (from 2 to 1 on Retina displays) but the text on the UI became quite blurry (however, it has a huge positive impact on performance!).

So what I tried instead was to reduce the render scale of the URP setting which doesn’t impact UI (from 1 to 0.6 for example) but it doesn’t have much effect, performance remains bad. Do you know any other simple techniques that don’t make the UI blurry but can have huge impact on performance?

Again thanks a lot for your support and dedication

Looking at the new profile, removing the glClientWaitSyncs did have a large impact to CPU times.

In the old profile CPU utilization was at 77%, whereas in the new profile, CPU utilization is down to 64%, so a (77-64)/77 = -16.9% reduction in CPU utilization. This is quite significant improvement.

However like you mention, that improvement is not translating to real world gains, and overall performance remains unimproved. Looking at the profiles, the 16.9% of time that is optimized is now just shifted in Firefox profile to PWebGL::Msg_GetFrontBuffer - meaning that the code is now just waiting more to present. This confirms that the rendering is GPU bound, rather than CPU bound. This was also suggested by the fact that resizing the render target size changed the performance.

What makes this glClientWaitSync business more complex is that on some other GPUs, not waiting for glClientWaitSync results in stuttering behavior due to subsequent glBufferSubData()s stalling the CPU-GPU pipeline. So we need to find a way to remove the glClientWaitSyncs in a way that does not regress other GPUs.

Now when the rendering is GPU bound and resizing affects performance, there are two likely scenarios:
a) GPU is simply rendering too many pixels, i.e. the app is fillrate bound,
b) GPU is not necessarily rendering too many pixels, but the shaders in those pixels are too complex, i.e. the app is pixel shader ALU or memory bandwidth bound.

One way to optimize would be to reduce the number of pixels rendered to tackle scenario a), like you are already doing. Some thoughts come to mind:

This is unfortunately true, since DPR affects the overall rendering resolution. One thing to try here is to test if switching from bilinear filtering to pixelated/point filtering would give more acceptable results for the UI text to be more readable. This is controlled via a html page CSS style for the canvas element.

Check out https://developer.mozilla.org/en-US/docs/Web/CSS/image-rendering

When you write “doesn’t have much effect”, do you mean that it does not have much visual effect (for better or for worse), or it does not have much performance effect? If it does not affect visually, try to double check that the setting is working for WebGL, e.g. by setting it to 0.1 or something like that to use a really small intermediate render target.

If setting the URP scale to 0.1 does not make any impact to performance, then either the GPU fillrate is really constrained if we’re in scenario a), and rendering the UI fills too many pixels to cause a perf impact.

Or if the issue is b), then it would be good to double check whether there exists some element in the app (either in the 3D scene, or in the UI) that taxes the GPU exceptionally badly.

Another thing to double check is that URP MSAA is disabled. That can eat fillrate really badly.

Typically we see CPU performance being 20%-30% worse in wasm compared to native, but here the issue is not a CPU bottleneck, so that does not apply. On GPU side we generally see performance being about the same, although given that newer GPU rendering specifications are not available on the web (looking towards WebGPU…), there are some corner cases where performance will be much worse on WebGL compared to native GL (e.g. transform feedback, memory mapping related synchronization). Though this should not be one of those cases, since the rendering here is very standard.

2 Likes

Thanks, much appreciated.

We do struggle a bit to maintain documentation on various web development techniques that do exist on the web in general, and are somewhat common to web developers - but might not be common to native game and engine developers. E.g. web hosting, browser persistence and caching, security primitives, CDN load balancing and mirroring, and optimization… there are a lot of web/JavaScript specific documentation out there, but those can be hard to connect to from within the Unity realm.

We’ll keep an eye out for good material to put out that would connect these two worlds together a bit better.