Instantiate/LoadAsync Performance on SSDs Vs HDDs, RamDisks, large amounts of RAM,etc

Hi,

I am working on a complex virtual world simulation, and I am also planning to build a new PC sometime before the end of the year (Once the 3080TI’s and the next gen of Ryzen comes out).

One of the main bottlenecks with my current project is instantiate and loadasync. I have a lot of paging in and out of complex scenes, etc, and I am looking for ways to alleviate this.

Obviously, the main thing would be a good CPU, so the successor to the 3950x (4950x?) coming out this year would be a good start (Even though instantiate is single-threaded, the 3950x has similar single core performance to a comparable Intel CPU, but has far better multi-threading support).

However, I have some other questions about how instantiate and loadsceneasync actually work with different types of hardware.

For example, will these two functions be faster on a fast SSD compared to a slow SSD, or a HDD? The ryzen 3950x has 20 CPU lanes, so I coud use 16 for the graphics card, and 4 for a dedicated SSD, would this speed up asset loading and instantiation, our would the CPU be the bottleneck?

Secondly, can Unity support 128GB’s of RAM? Could I load a large number of complex assets into RAM at load time, to prevent having to instantiate them at all, or is there some hard limit in Unity that would prevent it from doing this?

I know there is a “bug” or known issue with a single scene being over 4GB (due to 32-bit floating point accuracy) but splitting each scene into smaller ones (under 4GB) and loading them one at a time seems to solve this. However, is there some other issue with loading assets into RAM, or is the amount of RAM on the system the only limiting factor?

Finally, what about RAM Disks? Would instantiate/loadscenasync be noticeable faster if the build was running from a RAM Disk, or again, would the CPU be the limiting factor?

Just to clarify, I am not planning to release a game that requires 128GBs of RAM or a RAMdisk to play, I somehow doubt the majority of games have this kind of hardware, this is a concept project.

Thanks for any advice!

Once you step into the territory of building three thousand dollar plus machines a Threadripper is no longer an outrageous consideration. A 3960X is typically around twice the cost of the 3950X but it has a 50% increase in cores, twice the cache, twice the memory channels, can address 256GB RAM, and has 64 PCIe lanes.

I don’t know how you determine memory bandwidth limitations but it wouldn’t surprise me if you were running into them.

1 Like

That’s a good point! Budget really isn’t the limiting factor for this build, I’m mainly concerned about performance.

However, what is the single threaded performance of the 3960X? Those CPU’s are really designed for workstation use, I’m not sure if they would be optimised for game development, or playing games?

Unity is still largely single threaded, so single-threaded performance is important.

The 3950X has a base clock of 3.5GHz with a boost of 4.7GHz. The 3960X has a base clock of 3.8GHz with a boost of 4.5GHz. That’s about 5% difference between boosts and 10% between bases. Gamers Nexus has a benchmark and the difference is a few frames.

GN Benchmark - 3960X versus 3950X and a few others

That’s interesting, the single core performance is a lot better than I thought. Maybe that could a viable option, 256 GBS of RAM would be a huge boost too!

Are these things also a bottleneck in your builds? If so, what kind of hardware are you targeting?

I haven’t looked heavily at build optimisation yet (I am aware, and hoping, that most of the issues are with the editor), but I am noticing heavy performance hits creating dynamic forests with many objects.

I am looking into procedurally generating trees (L-systems, etc, etc) instead of using instantiate, but I will still likely need to use instantiate somewhere.

The project I am working on probably won’t be released for at least 2 years, so even cutting edge hardware today would be starting to become somewhat more common by then. Plus, when the project is done it will of course be heavily optimised, whereas in development it wouldn’t be.

Try to avoid using instantiate as much as possible. If you need to render thousands of the same thing, use instancing. Look at the DrawMeshInstanced method.

If your project will take a couple years to complete, start with Unity 2020.1 beta right now. That version is the first version to support more than 4GB of content per scene. So if the 4GB per scene limit was what you were looking for a way around, just go with Unity 2020.1.

Wait, so the 4GB limit is gone in 2020.1?? That’s fantastic, I didn’t know that! I will probably wait until it comes out to use it, but I will definitely use that feature, that will help a lot.

I wasn’t aware of DrawMeshInstanced, I will certainly look into that, thank you! The problem is that with my forests, I have maybe… 9-10 different types of trees?

So each tree type might have, say, 50 instances, for a total of several hundred trees.

This is why I can’t (easily) use things like object pooling. I could create 9-10 different pools of 50 objects each, but at that point you might be losing a lot of the potential benefit.

Would drawmeshinstanced work here? Could I call it 10 times for 50 instances each, and still get a benefit over instantiate?

This thread has some relevant information:

https://discussions.unity.com/t/770289

Somebody else has assumed already that switching to Threadripper will help them with game dev, and reported the results.

That is exactly the kind of thing that DrawMeshInstanced is excellent at. Basically, instead of telling the GPU “Draw this tree” for each of 1,000 trees, you’ll be saying “Draw this tree in these 50 spots” for each of your 10 different trees.

I can’t predict what results for your trees will be, because it depends on whether your game is CPU or GPU bound. If your game is CPU bound then it’ll help drastically. I’ve used it to draw thousands of objects with performance indistinguishable from not drawing those objects at all.

This isn’t the type of problem that pooling is useful for. Pooling is useful for when you’ve got lots of similar objects with short life cycles.

Correction. They thought that switching would automatically provide better results but it definitely requires you design your application with it in mind. Unity’s DOTS framework can scale up with high core counts but there are other routes such as separating the world state from the actual engine and handling rendering yourself with DrawMeshInstanced.

Below is a statement by Joachim that they’ve tried as high as 32 cores with DOTS.

https://discussions.unity.com/t/704106

As well as a tweet by a developer that DOTS scaled very well on their Threadripper.

https://twitter.com/SebAaltonen/status/1030325568255012864

That

That sounds fantastic! I will definitely check that out. My project does seem to be CPU bound at the moment (Although I have an older computer, so I’m guessing the graphics card is probably holding it back too) so that should help a lot.

Pooling is used to reduce CPU cost of spawning/despawning objects. And not rendering times.

It is also possible to create a single pool for all objects you’re using. (Dictionary mapping prefabbed GameObject to a list pr set of existing instances would do it)

Consider what kind of computer your audience is likely to have when they’re playing your game, too. You’re right that buying nice hardware for development helps with things that aren’t optimised yet, and with the overhead that comes with running an editor on top of your game. Still, you want to make sure that the game itself runs nicely on hardware that the majority of your audience has, and contrary to popular belief not all optimisation can be left until towards the end.

For instance, switching to DrawMeshInstanced can make rendering of repeated objects much more efficient, but it only handles the rendering. If those GameObjects have any other functionality - collision, audio, etc - then you’ll want to consider early on how to handle that, too. (And pooling could come in handy there.)

According to Unity, the 4GB limit per scene was fixed in Unity 2020.1.
https://discussions.unity.com/t/645929 page-5#post-5300130

DrawMeshInstanced is a great method to use if you want to render lots of things. If you have 10 different types of trees with 50 trees each, then you would call DrawMeshInstanced 10 times. Each call to DrawMeshInstanced will accept an array that contains the locations and rotations of all of the instances of a given type of tree. With DrawMeshInstanced, you would could draw your 500 trees using 10 draw calls instead of 500 draw calls. It will be a huge improvement for you.

Here is a link to my instancing pool code, which uses DrawMeshInstanced to draw lots of laser projectiles.
https://github.com/ShilohGames/InstancingPoolDemo

That code won’t direct work for trees, but you could use it as an example to get an idea how the method works.

How is this done? I have never heard of “Dictionary Mapping” before!

Do you mean creating a list of all instances of all gameobjects, and then instead of instantiating and deleting them, I would move them around the world, to reuse them?

That’s a good point, this is why I would tend to avoid CPU’s like the Threadripper, since no gamer is likely to have a 32-core CPU for a long time, whereas a gaming or “pro-sumer” CPU like the 3950x might become more common in 2-3 years or so.

I mean a 1080 for example is now a very common graphics card, a few years ago it would have been very high-end and available only to enthusiasts, etc.

Thank you very much for that, that’s very helpful!

The only concern I have with drawmeshinstanced, is that it’s a single call, right?

WIth instantiate, I can put it in a coroutine, and draw, say 50 objects over the course of a few seconds. With drawmeshinstanced I would have to draw all of the objects at once, could that be slower?

With Instantiate, you dynamically create more game objects in your scene based on a prefab you made. If you add lots of game objects to your scene, you bog your game down.

With DrawMeshInstanced, you do not add game objects to your scene. What you do is directly add GPU based rendering commands to each frame. You call DrawMeshInstanced once per frame in an Update method, and you completely skip all of the game object steps.

I have worked with DrawMeshInstanced and Instantiate. When you are dealing with lots of units, DrawMeshInstanced wins by a mile. With laser projectiles, DrawMeshInstanced is about 6-10 times as fast. DrawMeshInstanced is about 6 times as fast as a really well designed object pool, and DrawMeshInstanced is about 10 times as fast as simply using Instantiate.

In my 3D space games, I can deliver 100-300 FPS using DrawMeshInstanced. Using Instantiate, the same hardware would get about 10-30 FPS. The performance difference is not even close.