Trying to have 30 FPS with 1 million Entities

Hi All,

Short Version:
After installing DOTS, I’ve been trying to instantiate 1.000.000 instances and have the unity player stable at 30FPS, but it’s at 0.4FPS

Long Version:
I bet this topic has already been discussed somewhere but after looking for it on the internet I couldn’t find the answers. Nothing better than try it myself then! :slight_smile:
After doing experiments with instantiating prefabs (ECS prefabs) I wonder how would it be even possible to render millions of instances on screen as with 1 million unity struggles. I’m trying to find out how many instances can we handle on screen after optimization (aiming to build a large scale simulation)
Now I reckon that’s a lot more than the old MonoBehaviour could handle but still - during the Unite Austin presentation it’s mentioned that ECS can handle millions of instances on screen, so I’d like to know HOW.

So I created a project, followed the manuals and forums and converted MonoBehaviour prefabs to Entity prefabs - currently using the Hybrid v2 Renderer.

This is what I’ve got:

Little buddy here is the mesh to be rendered - 1.7k tris according to Unity.

Then I wrote the code to convert it to an ECS Entity prefab and instantiated 1.000.000 instances:

Yeah that white plane is actually 1 million instantiated copies of the prefab (it took Unity around 2 minutes to start running). Note the FPS: 0.4 (the editor is open, if I maximize it goes to 0.7FPS :-D)

And this is what happened with the memory:

Then the last thing is what happened to the Systems:

There’s a Hybrid Renderer System that takes 100ms±
I wonder if that’s where the problem is? If it takes 100ms per instance that would indeed kill performance, but not sure what that information means.

Technical Details:

  • My machine specs are Ryzen 9 5000 series, RTX 3060, 32GB (Alienware R15 M5)
  • There are no Systems running that would impact performance (as seen on the run times above)
  • Bellow the code used to convert the prefab and instantiate things:
//PrefabConverterDeclare.cs

using Unity.Burst;
using Unity.Collections;
using Unity.Entities;
using Unity.Jobs;
using Unity.Mathematics;
using Unity.Transforms;

[UpdateInGroup(typeof(GameObjectDeclareReferencedObjectsGroup))]
class PrefabConverterDeclare : GameObjectConversionSystem
{
    protected override void OnUpdate()
    {
        Entities.ForEach((Knight prefabReference) =>
        {
            DeclareReferencedPrefab(prefabReference.Prefab);
        });
    }
}
// EntitySpawnerSystem.cs

using Unity.Collections;
using Unity.Entities;
using Unity.Mathematics;
using Unity.Transforms;
using UnityEngine;
using Random = Unity.Mathematics.Random;

public class EntitySpawnerSystem : ComponentSystem
{
    private Random random;

    protected override void OnStartRunning()
    {
        random = new Random(56);
        Entities.ForEach((ref KnightPrefabComponent component) =>
            {
                var entities = EntityManager.Instantiate(component.prefabEntity, component.totalOfInstances, Unity.Collections.Allocator.Temp);

                foreach (var entity in entities)
                {
                    EntityManager.SetComponentData(entity, new Translation()
                    {
                        Value = new float3(random.NextFloat(0, 200), 0, random.NextFloat(0, 200))
                    });
                }

            });

    }
    protected override void OnUpdate()
    {
      
    }
}
// KnightPrefabComponent.cs

using Unity.Entities;

public struct KnightPrefabComponent : IComponentData
{
    public Entity prefabEntity;
    public int totalOfInstances;
}
// Knight.cs

using System.Collections;
using System.Collections.Generic;
using Unity.Entities;
using UnityEngine;

public class Knight : MonoBehaviour, IConvertGameObjectToEntity
{
    public GameObject Prefab;
    public int totalOfInstances;

    public void Convert(Entity entity, EntityManager dstManager, GameObjectConversionSystem conversionSystem)
    {
        var prefab = conversionSystem.GetPrimaryEntity(Prefab);
        var component = new KnightPrefabComponent { prefabEntity = prefab, totalOfInstances = totalOfInstances };
        dstManager.AddComponentData(entity, component);
    }
}

Is it possible to increase performance on this case?

Well, first of - make sure material instancing enabled.

Second, you’re better off making a custom renderer for the specific use case.
No renderer would be able to handle 1kkk entities on the screen at the same time, unless its using Indirect Instancing.

So I highly suggest looking into it.

Disabling shadows should help somewhat as well to reduce extra draw passes and draw cost overall.

Hybrid renderer is oriented for the default “brain-dead” conversion from GameObject to Entity rendering.
Not to mention that its not optimized properly yet.

1 Like

Try looking away. You are rendering 1.7 billion triangles in 16k batches.

Startup should be doable in a few seconds by doing the position initialization in a bursted loop.

3 Likes

Is this a thing? As far as I know the material instancing checkbox does nothing with SRP batching enabled. See my post here:
https://discussions.unity.com/t/828495/2

1 Like

No idea tbh, I’ve never used Hybrid Renderer for entities, only custom built one (my own).
Also, not sure what stats should show as “saved” in this case.

At least having the checkmark option should mean that material supports instancing.

If OP enabled instancing, and stats show 16k as batched / 1023 per instance - then only Indirect Instancing would help in this case.

As it seems that OP is heavily CPU bound.

No, HybridRendererV2 only draws with instancing independent of the material flag.

1 Like

Try to add a StaticOptimize comp. Transform and HybridSystem has a lot of overhead for entities it thinks are moving.

1 Like

This doesn’t look like a ECS optimization issue tbh since you’re almost not doing any game logic to be run by cpu. Rendering can be improved in other ways, for example batching meshes by material and using Graphics.DrawMeshInstanced. You can even use Graphics.DrawMeshInstancedIndirect which is more advanced but there isn’t much documentation on it (you need to do some shader setup).

Thank you all for all these replies! I’ll try the suggestions here and see if there are improvements.
@VergilUa : When you say instancing, are you talking about GPU instancing in the material?
@Enzi : Would you know where I add StaticOptimize?
@vectorized-runner : Is that batching different than GPU instancing?

Also as you guys mentioned the number of tris, I’ll use a default cube and compare the improvement as it might have some room for improvement using LOD variations as I understand!

Graphics.DrawMeshInstanced should be the same as toggling GPU Instancing on the material (except maybe culling, you can use DrawMeshInstanced and write your own simplified culling and it could be faster, also no renderer overhead), but
DrawMeshInstancedIndirect can render much more meshes in one batch, you definitely need to check that out if you want to attempt this many meshes.

1 Like

So I tried with a default cube and made sure the SRP Batching was on (now it’s hidden under debug mode on the URP asset).
The FPS got way better for the default cube (77FPS!!!). Here are 1 million cubes (non-static)

Things to note is that the number of batches is a lot higher, and the number of tris is a lot smaller as well.
Perhaps if I have LOD setup I can bring down the number of tris, the this batch number not sure how it works!

Something to do with the material setup? This model I used for the initial test is an asset which uses multiple materials. Should I make sure assets have only one material as well?

If you want to scale to massive entity count, you definitely want to make models with just 1 material.

You want to share materials as much as possible between instances, using some kind of texture atlas. (Hybrid renderer has very fast per instance properties, so you can give objects different uvs while using the same material)

if you are using URP, using shader graph is the simplest approach & should generate performant shader code that hybrid renderer supports.

Depending on if you are using Hybrid Renderer V1 or V2. You might have to enable instancing manually.

You want to make sure you have very good LOD’s. 4-6 LOD levels is generally a good idea.
Culling in the distance using LOD groups can also help a lot.

8 Likes

Generally i would not advise to use Graphics.DrawMeshInstanced or indirect directly.
In real games you need culling, LOD’s, want it to work with lights&shadows etc. WIth custom Graphics.DrawMeshInstanced you need to do that yourself. With Hybrid renderer that works out of the box.

Hybrid Renderer gives you that and is well optimised for massive amounts of instancing.
A bespoke solution, that comes with limitations that you know in your specific game can always beat what we do of course. You just have to really know what they are and what you will invest in,

Hybrid renderer is also going to get faster / less overhead / more feature complete, with future releases.

16 Likes

Speaking of. Any chance Graphics.DrawMeshInstanced gets an overload with native container instead of managed array / list?

That would boost performance for custom written renderers majorly, as data management could be bursted and float4x4 could be used (instead of Matrix4x4).

At least NativeArray would be nice to have.

I know indirect kinda covers the need for the matrix / data passing, but for mobile / middle-end devices compute shaders tend to provide somewhat slower results.

2 Likes

I’ll get into blender and create the multiple LOD’s, I’ve been experimenting with this after the suggestions in this thread and it indeed will help, thanks for that.
Also I’ll bake the textures and see how it impacts performance.

“if you are using URP, using shader is the simplest approach” - What did you mean by this?

Directly on the prefab/gameobject that’s being converted:
7624432--948394--upload_2021-11-3_0-0-18.png

1 Like

At such large numbers you’ll need some clever LOD and animation solutions. For example: even if your models have only 100 triangles each, that still means rendering 100 million triangles per frame if they are all visible, which is still quite a lot even for a game targeting modern systems.

2 Likes

After reading all the comments and adjusting the project I see that perhaps the brick wall here is not the number of entities but the number of triangles + material then?
I’ve just done the baking of materials (now it’s just an uv map per object) and some LOD.
It got from 0.5 FPS to 4FPS.
I believe I’m going somewhere, so:

  • What’s a healthy number of tris visible on screen (that dots / unity can cope with)?
  • If I want to do some culling to not render objects overlapping reach other, how could I deactivate the render mesh ? (Or activate some dots culling)
  • @Joachim_Ante_1 you mentioned imaging shader graph for urp and enabling instancing manually: could you please tell me how I enable this instancing on hybrid V2 and if shader graph shaders are more performant than built in ones?
  • If I want to do some culling to not render objects overlapping reach other, how could I deactivate the render mesh ? >(Or activate some dots culling)

LOD& culling is always active and automatic.

However, if by “overlapping each other” you mean you want occlusion culling, this feature isn’t fully implemented yet for Hybrid Renderer. The old occlusion culling system isn’t compatible either. You can try to enable the experimental occlusion culling in HRv2 but it’s pretty bare-bones right now, I wouldn’t recommend it.