Browser:Edge 136.0.3240.76 (arm64)
Test Device:Mac Mini M4 Pro 12+16(MacOS 15.3.1)
Unity Version:6000.0.42f1
I created a test demo designed for minimal overhead, rendering cubes without optimization techniques like Static Batching, Dynamic Batching, or SRP Batcher. Each cube is rendered in its simplest form, having only a color material applied. The scene uses a single Directional Light source.
When building WebGPU and WebGL versions of this application, I expected WebGPU to achieve frame timings at least comparable to WebGL. However, the WebGPU version exhibited nearly double the frame time, significantly deviating from my expectations.
- Profiling Results: Further analysis using browser profiling tools revealed substantial time spent within “BindGroupManager” related operations in WebGPU (though their specific purpose and necessity are unclear to me). Specifically:
BindGroupManagerWebGPU::UpdateConstantBuffer: 24% CPU Time
BindGroupManagerWebGPU::Bind: 9.1% CPU Time
- Frame Capture Analysis: Frame capture comparisons show that WebGPU, prior to
draw call invocation, appears to perform significantly more data writes compared to WebGL. This suggests WebGPU might be performing full writes of the entire BindGroup data structure for each update, whereas the WebGL path seems capable of writing updates selectively only for the modified values (partial/delta writes) . This behavior in WebGPU would undoubtedly impose a heavier burden on memory bandwidth. My question: Is this difference primarily due to suboptimal optimization in Unity 6’s WebGPU implementation, or is it a deliberate design choice stemming from architectural constraints or other considerations specific to WebGPU?
I have delved into the webgpu backend for a while now, and I really want to know whether webgpu can be at least on par with webgl in any scenario, and if not, what would constitute the best practices for webgpu?
I’m sorry I’m not a native English speaker, I can only use machine translation to express my opinions. If there is anything unclear in my statements, please feel free to point it out.
This is a screenshot of the WebGPU runtime, showing the frame time, number of set pass calls, and draw calls in the upper left corner. The only difference between WebGL and this image is that the frame duration in the upper left corner only takes 35.6ms.
WebGPU is a newer and in many ways more optimal API. But WebGL has many years head start for optimizations. There is definitely room for improvement with WebGPU, both with our calling into WebGPU and the browsers implementation of it.
It makes me happy to see you making use of my inspector tool 
In the above frame analysis, the bindgroup does have an offset of 0 for the two buffers in that case, but that is because it is using dynamic offsets. It allocates a big buffer (the ScratchBuffer), and uniform buffers get a chunk of that big buffer, the offset of which is passed to the draw call. This avoids needing to create lots and lots of different BindGroups.
You are correct about the heavier bandwidth requirement due to uniform buffers instead of individual uniform calls. Hopefully that will be improved over time. WebGPU does have some design limitations that give other APIs some advantages, but those other APIs are not needing to have the device reach that WebGPU has so WebGPU is more conservative in it’s features. Things like persistent mapped buffers, which WebGPU does not support (and some devices do not support even if the API like Vulkan does).
I would say it’s not simple to do a 1:1 performance comparison with WebGL. WebGPU can do things that WebGL is incapable of, and the same is true for WebGL. That said, WebGPU will continue to improve, and WebGL will stay fixed in terms of features and performance.
Thank you for your reply. I really appreciate your answer, and your inspector tool has helped me a lot. 
I still have two questions. 
- I have heard that there is a suggestion for WebGPU: “Set state as infrequently as possible, and break state up based on how frequently it changes.” In many cases, WebGL only needs to update a small amount of necessary data, but WebGPU updates entire uniform buffer. Will BindGroup in the future be refined based on change frequency? I think this might reduce a lot of bandwidth consumption.
- Without considering the more advanced features of webgpu, I found that the performance advantage of webgpu over webgl often benefits from reduced instruction call time. However, when instruction calls are not a performance bottleneck (as shown in the picture below), webgpu can be noticeably slower than webgl. Is that means WebGPU always spend more CPU time on organizing and managing data?
-
These uniform buffer writes are definitely an area that has been an area on my list of things to continue investigating. Vulkan and DX12 have similar issues and they’ve been able to refine over time. There have been some talks by the browsers to improve the memory throughput, directly mapping backend C++ or even GPU memory to WASM memory for places like mapping buffers for write, but this would still be some time off and it’s not certain we would be able to make use of it. Reducing the number of bytes written is the ultimate goal.
-
I’d say the advantage of WebGPU is that the CPU time of organizing and managing data can be done upfront being defined in objects that can be reused. For example, with WebGL, you can create a shader program, but it doesn’t actually compile the shader to the native backend at that time. That doesn’t happen until the first draw call. With WebGPU you can create a render pipeline object and the shader will be compiled at that time. That means you can do upfront shader warmups with WebGPU much easier than you can do that with WebGL, reducing runtime stalls. On the other hand, GLSL shaders can tend to be a bit smaller than WGSL shaders, and the GLSL compiler in ANGLE has been around for over 10 years and has had more time to be optimized than the WGSL compiler. So it’s never easy to do performance comparisons.
These types of exploitation performance tests, designed to scale draw calls and state changes to reveal/exploit strengths/weaknesses of APIs, tend to happen less often in real world projects, and techniques like SRP Batcher are designed to help with this type of content in real projects. But it’s still a reasonable way to find where the weaknesses of APIs start to reveal themselves.
In the end, yes WebGPU still has a lot of room for improvement. The projects people have been making with Unity WebGPU have been invaluable resources to the browser developers (Google, Apple, Mozilla) for pushing their WebGPU implementations and they can find clear places where they can work on optimizing their implementations, as well as finding opportunities to optimize our use of it. I was just working with Apple recently with a bigger Unity project and they were able to improve the Safari performance 2x because they had some performance issues with indexed draw calls. I personally find it really exciting that we’re working together to push forward graphics and gaming on the web!