Looking at the new profile, removing the glClientWaitSyncs did have a large impact to CPU times.
In the old profile CPU utilization was at 77%, whereas in the new profile, CPU utilization is down to 64%, so a (77-64)/77 = -16.9% reduction in CPU utilization. This is quite significant improvement.
However like you mention, that improvement is not translating to real world gains, and overall performance remains unimproved. Looking at the profiles, the 16.9% of time that is optimized is now just shifted in Firefox profile to PWebGL::Msg_GetFrontBuffer - meaning that the code is now just waiting more to present. This confirms that the rendering is GPU bound, rather than CPU bound. This was also suggested by the fact that resizing the render target size changed the performance.
What makes this glClientWaitSync business more complex is that on some other GPUs, not waiting for glClientWaitSync results in stuttering behavior due to subsequent glBufferSubData()s stalling the CPU-GPU pipeline. So we need to find a way to remove the glClientWaitSyncs in a way that does not regress other GPUs.
Now when the rendering is GPU bound and resizing affects performance, there are two likely scenarios:
a) GPU is simply rendering too many pixels, i.e. the app is fillrate bound,
b) GPU is not necessarily rendering too many pixels, but the shaders in those pixels are too complex, i.e. the app is pixel shader ALU or memory bandwidth bound.
One way to optimize would be to reduce the number of pixels rendered to tackle scenario a), like you are already doing. Some thoughts come to mind:
This is unfortunately true, since DPR affects the overall rendering resolution. One thing to try here is to test if switching from bilinear filtering to pixelated/point filtering would give more acceptable results for the UI text to be more readable. This is controlled via a html page CSS style for the canvas element.
Check out https://developer.mozilla.org/en-US/docs/Web/CSS/image-rendering
When you write “doesn’t have much effect”, do you mean that it does not have much visual effect (for better or for worse), or it does not have much performance effect? If it does not affect visually, try to double check that the setting is working for WebGL, e.g. by setting it to 0.1 or something like that to use a really small intermediate render target.
If setting the URP scale to 0.1 does not make any impact to performance, then either the GPU fillrate is really constrained if we’re in scenario a), and rendering the UI fills too many pixels to cause a perf impact.
Or if the issue is b), then it would be good to double check whether there exists some element in the app (either in the 3D scene, or in the UI) that taxes the GPU exceptionally badly.
Another thing to double check is that URP MSAA is disabled. That can eat fillrate really badly.
Typically we see CPU performance being 20%-30% worse in wasm compared to native, but here the issue is not a CPU bottleneck, so that does not apply. On GPU side we generally see performance being about the same, although given that newer GPU rendering specifications are not available on the web (looking towards WebGPU…), there are some corner cases where performance will be much worse on WebGL compared to native GL (e.g. transform feedback, memory mapping related synchronization). Though this should not be one of those cases, since the rendering here is very standard.