Even poorer performance for ECS Hybrid Renderer comparing to traditional GameObjects?

https://github.com/SetoKaiba/TestECS
Here’s the github to test with.
I make a simplified project for benchmark.
For traditional GameObjects approach, it’s with better performance.
For ECS Hybrid Renderer approach, it’s with poorer performance.

Both instantiate 1000 different meshes(clones, so they’re different) for rendering.
So the rendering is not instanced.
I’m just to compare the basic performance without instanced.

The current ECS approach is with poorer performance with non instanced rendering.
Is that intended?
Or do I miss something?
The RenderMeshSystemV2 costs too much CPU.



Try activate GPU Instancing in the Material.
I recently tested render 10000 cubes with ECS and (HDRP on 2019.1). With a old Radeon HD 7850 i have 80-100 fps in Editor.

@runner78 What I’m discussing in this thread is to compare the basic rendering performance.
I mean non instanced rendering.
Is the poorer performance intended?
Or do I miss something?
The RenderMeshSystemV2 costs too much CPU.

Can you try spawn the cubes not on the same place?
Material GPU Instancing reducing drawcalls and save CPU time. In one my tests i also simulated and displayed 200000 simple meshes a the same time with 30 fps, with the old GameObject not possible.

ECS was designed for parallelism. When you try to parallelize a task which cannot run in parallel, you inherit all the overhead of the architecture without any of the benefits. You are showing us a contrived example of having 1000 objects managed by the same system which cannot be batched together. This isn’t really what ECS was designed for. Also, the CPU overhead for this worse-case scenario could be improved in the future as the API matures.

1000 meshes shouldn’t take that long despite worst-case for it right? Did you look at a builds’ performance instead of editor performance?

That’s a misconception. Although it greatly benefits from the parallelism, ECS is not inherently about parallel processing. It’s about linear processing without a lot of extra data in-between. When they talk about tight loops, they talk about this.
What designed for parallel processing is the Job system.

2 Likes

There would really be no point in taking all the time and effort in migrating over to ECS if you’re not taking advantage of, at least to some extent, the Job system and the Burst compiler. I realize that these are separate components and can be used independently. But to suggest they weren’t intended and designed to work together is a misconception. Even the term “Pure ECS” implies use of the job system.

RenderMeshSystemV2 is built around the concept of instancing.
Testing it against the legacy system with no instancing means you’re looking at it with wrong expectations to begin with.
Of course it will perform worse - it is simply not built for that purpose because it’s going to great length to batch draw calls.

IMHO when you do have a scenario where you have to have 1000 unique meshes generating 1000 draw calls - then arguably you’re simply doing things wrong.
And i’m not even going into the memory implications of keeping 1000 unique meshes in memory… or the terrible memory fragmentation due to 1000 unique SharedComponents…and many more bad things.

So in my opinion your test 1. isn’t a fair comparison in the first place and 2. it articially supposes a scenario that should never exist in a real application if things have been put together properly.

Straw-man says hello.

1 Like

My use case is a voxel terrain system, which generates unique meshes per chunk. It handles about 10k unique mesh draw calls at 60 fps using the standard meshfilter/meshrenderer combo. That’s 10x the performance of what the OP has posted while rendering actual meshes in an actual project instead of this test.

On topic; is there a chance this is an edge case with the data structures used by the renderer? Like some grouping by world position for batching/culling running into issues due to all those meshes being at the same place?

3 Likes

I have tested test with 20000 cubes without instanciating, with 2 monitor, one the game, the oder the scene view.

GameObject: 25-30 FPS
Entites: 15 FPS
Entities with deactivated Job leak detection and Bust safty checks off: 55-60 FPS

2 Likes

@james_unity988 @Chris-Herold No. You’re not correct. Like @Zuntatos saying, I’m trying to migrate my voxel terrain system to ECS. I can take advantage of Job to generating the terrain asynchronously. The official declared ECS as a replacement of traditional GameObject system instead of a complement. So I think at least for non instanced rendering. It should be with equal performance. At least, it should not equal.

@Zuntatos Yes. That’s what I’m trying to express. I’m migrating my voxel terrain system to ECS as well.

@runner78 Thank you. I found out that as well. I try to test with Standalone players instead. The performance is better. But the performance of ECS is still unstable. ECS is with 14-22ms while GameObjects with 16ms. Can you please share your benchmark on GitHub? Thank you.

Did you find any solutions? I am facing the same issue with my planetary terrain system. Due to its nature it consists of only unique meshes.

still looking for a solution.

Avoid RenderMesh altogether and build your own rendering system. There are literally dozens of threads in this subforum on this topic. Use the search bar if you want ‘inspiration’ on that.

Exactly what i did