GPU Instancing performance variation

Hey Guys, I implemented GPU instancing, but everything works great when no scripts are attached to the prefab that I am instancing. I can instantiate 20,000 spheres and still get a descent 35FPS, with 10,000 objects I get roughly 65 FPS and with 5,000 92 FPS. but when I attach a very simple script with one line of code to the prefab performance decreases drastically.

With 5,000 I get 23FPS
with 10,000 I get 9 FPS

and with 20,000 3 FPS.

When I did the same process with a fairly more complex script… roughly 200 lines of code, the performance dropped even more and when I tried 20,000 objects my computer could hardly instantiate them and eventually It got stuck. Basically I am assuming that there is a lot of behind the scene things unity is doing to compile all the scripts attached to all those prefabs and that is what is causing the performance lag. But interestingly enough when I check the performance in the profiler I only get a big consumption of “Physics”, then in “Others” but “Scripts” does not even appear. See attached image as reference.

Is there any way to save performance when more complex scripts are used? These are just small tests I did for a larger project I am developing where I have quite a long set of scripts, and my goal is to at least be able to run 5,000 instances at a descent frame rate, 35-50 would be fine.

Any insights would be great!!

Thank you

First, this:


“Basically I am assuming that there is a lot of behind the scene things unity is doing to compile all the scripts attached to all those prefabs and that is what is causing the performance lag.”


For C# the compilation happens at build time, not at run time, though there could be an implication at runtime based on the way you build .NET code. On some platform targets that may well be compiled to native code (which is to say compiled to the CPU’s native instructions, so there’s no compilation at runtime).


What you’re observing is the code itself functioning. GPU instancing is a powerful technique limited to the visible display of those instances, but that won’t have an impact on objects with code instantiated for every one of them. While the graphics will be more efficient, that does nothing to stop the fact that 5,000 objects are instantiated (for 5,000 instances) in C#, and code is going to be executed for each one during Update and FixedUpdate (where applicable).


What may be required in this situation is to say what your code does. This is not with an aim to improving efficiency of that code (which may have limited impact as it is being multiplied by the number of instances), but to imagine a way of implementing the purpose of that code without being attached to each instance. Without knowing why you have the code attached I can’t hope to advise, but if you can fashion a supervisor, something that controls the instances, you may be able to greatly improve performance. For most optimization, the basic rule is that changing the algorithm, or the underlying method of performing the work, may have the largest impact on performance.


If you’re hardcore about this, you might investigate the methods used to apply C++ code to objects, rather than use C# code. Here, again, the impact varies greatly (from minuscule to miraculous). It may be exactly the direction you’ll want, or it could be a nightmare you should avoid, depending on how deep you tend to go and whether or not you’re comfortable with C++. Google provides resources (you’re basically making a native plugin resource). However, before you consider it you must weigh your situation and objectives. This would start, primarily, as a learning experience and engineering experiment to see if it works for you. If you can’t afford to invest time into the investigation, you’re not a candidate for it. Where the fundamental problem is still the attachment of code to thousands of objects (and the associated work Unity performs calling Update and FixedUpdate, and/or whatever else is hooked in), C++ can offer SOME benefits (which could be exactly what you need), or no benefit at all. Once you’ve practiced some C++ code in Unity, you find it merely an alternative (once the prerequisites are out of the way) method of coding, though with considerable overhead for each platform target.


That said, I think you’ll observe that if you attach scripts that do nothing, you probably find little impact. If you then take that ‘do nothing’ script and add trivial work in Update and/or FixedUpdate, you’ll find an overhead you can measure by comparison, which is just the work required to make the method calls on 5,000 objects. After that, everything Update or FixedUpdate does will be multiplied by the number of objects running that code. This is why if you can imagine any way to accomplish your intentions with code that is not attached to every object has the highest potential impact.

Hi JVene, thanks for your reply.

My first script I tested was just

void Update () { 
       transform.position += Random.onUnitSphere * 0.1f;	
	}

and it still dropped performance horribly. And the second script I testes was just some vector math of aligning and cohesion behaviors used in flocking. My real project at the moment is composed of different classes, the ones that are directly doing the computation on the instances is 1 base class that has roughly 1,400 lines of code which 2 other scripts inherit from. Those 2 sets of scripts are the ones attached to the prefabs and together they are around 800 lines. I have done some C++ a while back, but not using it on its full potential. I am also doing some nasty distances checks, although not on every Update call, but I am using a KDTree for that.

Regarding this:

“but to imagine a way of implementing the purpose of that code without being attached to each instance. Without knowing why you have the code attached I can’t hope to advise, but if you can fashion a supervisor, something that controls the instances, you may be able to greatly improve performance.”

How can this be possibly done?

C# JIT-compilation is only done once when your C# script is loaded for the first time, and I think Unity adds some minor overhead afterwards to cache references to Unity Event Functions like e.g. Update().
However, this overhead does not increase with the amount of objects that use that script, so it’s unlikely to be the issue in your case.

Since your script is so simple, the overhead is most probably the cost of calling code in an managed assembly (your C# scripts) from an unmanaged executable (the Unity Engine). That overhead is quite big and you should therefore always try to minimize the use of per-frame events like Update() as much as possible.

When you have to update a large number of objects with the same script, you should therefore make sure to update them all from the same script instead of making them all update themselves.

Unity is very aware of that issue and has therefore released the Entity Component System to help implement such scripts in clean way avoiding the overuse of Update() events.

If you still need more performance, the C# Job System is the way to go because it even allows your code to run multithreaded, but it is a bit harder to implement.

See Introduction to the Entity Component System and C# Job System