Performance BUG ? HybridRendererV2 even slower than GameObject-Renderer ?

Recently I am doing some research about HRV2 + URP and trying to update megacity demo to test the performance on Android mid-end device.But I found maybe some reason here,the performance is worse than Megacity-Demo which use hybrid render V1,Then I did a performance compare,And I found the HRV2 is even slower than GameObject Renderer.Is this a bug ,or I didn't set up my test project correctly.

Here is the test Scene:
7644634--952729--upload_2021-11-10_20-33-48.png

Both Running on Android(Mix2s,SnapDragon 845) IL2CPP + ARMV64 + Dev + Vulkan
Entities / URP / HybridRender Version is the lastest
Burst is On,Leak Detection is OFF

A. GameObject-Based
CPU Rendering Code Cost about 6.4ms
7644634--952726--upload_2021-11-10_20-32-53.png

B. HybridRenderV2 + URP
CPU Rendering Code Cost about (3.91 + 4.9) ms
7644634--952723--upload_2021-11-10_20-32-42.png
Addition compare,Editor PC i7-9700
HybridV2+URP [vs] HybridV1+HDRP
7644634--952735--upload_2021-11-10_20-36-27.jpg

Need some help,THANKs VERY MUCH!
@SebastianAaltonen @arnaud-carre

7644634--952726--upload_2021-11-10_20-32-53.png

Hello!

This is definitely an unexpected result. In our internal tests we have typically seen HRV2 to outperform both GameObjects and HRV1, both in Megacity and in other benchmark tests. It's possible that this is a bug or that something has been configured weirdly.

The HRV2 capture has two specific things of note:

  • There are large WaitForPresent / Present spikes in the frame, which suggests that the scene might be bottlenecked by GPU rendering instead of CPU. If possible, it would be good to take a GPU performance capture from both versions and look at the GPU side difference.
  • UpdateHybridChunksJob takes a lot of time. If you are not modifying any entities each frame, then this could indicate a very high chunk count.

Looking at the HRV1/HRV2 comparison, we can see that in the HRV2 version the chunk utilization is extremely bad, and the vast majority of entities seem to be in single entity chunks. We also see the HybridBatchPartition component, which is used to force this situation in order to sort transparencies correctly. I would suggest checking two things:

  • Make sure that your Materials are not marked as transparent unnecessarily. Transparent materials cause Hybrid Renderer to add HybridBatchPartition in order to sort transparencies correctly, but it costs a lot of performance. Regular opaque materials should not have it. It is also possible to disable this behavior with the DISABLE_HYBRID_TRANSPARENCY_BATCH_PARTITIONING scripting define, in which case you might see transparency ordering issues (in HRV1 you will always get this behavior, it cannot render sorted transparencies correctly).
  • Try profiling with incremental conversion disabled. This has been known in some cases to increase chunk fragmentation.

I attached an edited version of the HRV1/HRV2 comparison picture to highlight where the problem is visible.

7644691--952741--profile2.jpg

Thanks so much for quickly reply! @JussiKnuuttila .
I have already use DISABLE_HYBRID_TRANSPARENCY_BATCH_PARTITIONING scripting define and not work with LiveLink(Is it making incremental conversion disable),The test result improve a lot,rendering cost is lower chunk utilization is improved,but ata all still NOT GOOD

HybridRenderV2 + URP
CPU Rendering Code Cost about 2*.73(Improve 1.2ms) + 5.38*
Still Lower(at least,not faster) than GameObjectRendering
7646572--953158--upload_2021-11-11_10-9-36.png
Editor PC i7-9700 HybridV2+URP
1.45ms -> 0.59ms,
but lower than HRV1 (0.07ms)
7646572--953161--upload_2021-11-11_10-11-25.png

If it is necessary,I can report a bug with the test repo.THANKS

It is expected that HRV2 will always have some small baseline CPU cost, because it checks every chunk for changed data. HRV1 permanently classifies batches as static or dynamic, and has a very low CPU cost for static batches and a very high CPU cost for dynamic ones. In addition, adding new static batches also carries a high cost in HRV1. HRV2 does all of this automatically.

When comparing against HRV1, it's a good idea to profile total frame times instead of just system update times, as HRV1 also has a significantly higher draw call setup cost compared to HRV2, because it reuploads instance data from CPU to GPU each frame, even for static batches. In 100% static scenes, we would expect HRV1 to be slightly faster on the DOTS system side (like in your capture), but slower on the main thread rendering and render thread side.

We could consider adding some kind of feature (for example, a tag component) to mark entities as completely static and have Hybrid Renderer skip checking them for changed data to reduce this overhead. We will also continue to try to optimize the system update cost in general. If you want to test this kind of change yourself, you can try altering the query used by the Hybrid Renderer for UpdateHybridChunksJob to exclude entities with a certain tag component, and then add that tag component to your static entities.

1 Like

Thanks very much,I will try it latter.By the way,Is my caputre showing that HRV2 is slower than GameObject Rendering in my case,a performance bug or by design?

It is definitely unexpected, so I guess we could say it's a performance bug. In the capture, HRV2 rendering is faster with the actual rendering (DoRenderLoop), but has the added DOTS system overhead which is more than the difference. If the DOTS system overhead is optimized more, it would be faster again.

Possibly this overhead could be reduced by adding a way to opt out some entities from data update checking, e.g. via a tag component.

[quote=“JussiKnuuttila”, post:6, topic: 861603]
It is definitely unexpected, so I guess we could say it’s a performance bug. In the capture, HRV2 rendering is faster with the actual rendering (DoRenderLoop), but has the added DOTS system overhead which is more than the difference. If the DOTS system overhead is optimized more, it would be faster again.

Possibly this overhead could be reduced by adding a way to opt out some entities from data update checking, e.g. via a tag component.
[/quote]
Do you need me to sumbit a bug with the repo so you can fix that?

@JussiKnuuttila

I don't think this specific case (GameObject vs HRV2) requires a formal bug report. We will keep optimizing the Hybrid Renderer to try to make it faster. Thanks a lot for bringing this to our attention though!

2 Likes

[quote=“JussiKnuuttila”, post:9, topic: 861603]
I don’t think this specific case (GameObject vs HRV2) requires a formal bug report. We will keep optimizing the Hybrid Renderer to try to make it faster. Thanks a lot for bringing this to our attention though!
[/quote]
Thank you! I’m really looking forward to your update