Hi!
I ported Visual Pinball’s physics engine to ECS. It’s a 1:1 port of the original C++ code, so it doesn’t use Unity’s DOTS/Havok physics. Due to Burst and ECS’ memory layout, I expected performance to be in the same ballpark as the C++ implementation, but I was utterly disappointed: it’s several orders of magnitude slower.
I’m looking for hints where to put my focus first, so I’ll briefly describe the parts of my port where I had to improvise without documentation or examples. Both Visual Pinball and my port are open source, so I’ll link to the relevant code.
Game Loop
The physics engine uses 1ms ticks for each cycle. If a collision is detected, there might be additional cycles. So I needed a way to update my physics world more than once per frame.
I did that using a ComponentSystemGroup
which has a reference to all the needed systems, and updates them in a loop in OnUpdate()
. One of those systems does most of the simulation, and contains another loop.
So we have two groups, one with an outer loop, and one with an inner loop. Most systems are set to [DisableAutoCreation]
and are called explicitly by those groups. There are additional groups that are updated, but they are empty classes and their systems are attached via [UpdateInGroup]
.
I’m wondering what the overhead of this is, and if nesting is a problem.
Concurrency
The original C++ code is single-threaded. It’s a pinball simulator, so the apart from simulating multiple balls on multiple threads (knowing there’s usually only one ball), the job system won’t help a lot in terms of performance. And there’s no ROM emulation yet that would consume resources.
Joachim recognized this as well and suggested using Entities.ForEach().Run()
for systems dealing with very few entities in order to still benefit from Burst.
The port currently features 13 systems that are each updated multiple times per frame. Could this cause a significant overhead? I remember reading that DOTS was designed for hundreds of systems, so I would say no, but my gut feeling tells me otherwise.
Data Size
I haven’t split up the ball data that is queried pretty much everywhere. It’s not a huge struct (65 bytes), but I could split it up further, to not load attributes that aren’t used in a particular system.
To my understanding, splitting up data allows for better concurrency, since the jobs can be scheduled more aggressively, but as mentioned in the previous section, I don’t think that there is much to gain in terms of concurrency.
Maybe the size is a problem though?
[SOLVED] Chunk Looping
Three of my systems write to other entities’ data, so I have jobs that loop through chunks and somewhat arbitrarily update data. I haven’t figured out any other way, and I find it pretty awkward due to the amount of boilerplate code.
One thing I noticed when debugging is that every ball entity seems have its own chunk, while Entity Debugger showed me multiple entities per chunk.
Maybe I’m doing something fundamentally wrong here, so here are my three systems in order of execution:
-
DynamicBroadPhaseSystem (solved, before)
-
DynamicNarrowPhaseSystem (solved, before)
-
DynamicCollisionSystem (solved, before)
Context: Dynamic entities are balls, i.e. these systems only deal with ball/ball collisions. The rest of the world is static and handled by other systems.
Solved! Replaced the awkward chunk looping with ComponentDataFromEntity
.
Abstract Colliders
To resolve the collision between a ball and an object, I’ve implemented a [Collider](https://github.com/freezy/VisualPinball.Engine/blob/master/VisualPinball.Unity/VisualPinball.Unity/Physics/Collider/Collider.cs)
struct. However, a collider can have different types. For example there’s a point collider and a plane collider. Both types contain additional type-specific data and obviously different logic (but the same interface).
In order to get an inheritance-like structure, my Collider
struct just contains a header with the base data (including a type). The actual colliders (the “children”) contain the same header, plus their additional data at the tail.
When resolving collisions, the Collider
is cast to the type defined in its header and then executed. Since the collider tree is part of a BlobAssetReference
, I can allocate each collider based on its type.
This works, but it might be really bad for performance. Here are the relevant snippets:
- ColliderBlob (blob asset creation)
- PointCollider (allocation)
- Collider (cast and hit test)
Get/Set Component Data
Some systems read and write data that isn’t queried in Entities.ForEach
. For example, a ball-flipper collision fetches and updates the flipper’s movement data (while Entities.ForEach
loops through balls).
So, based on the collider type, I first fetch additional data with GetComponent<>()
, update it, and write it back with SetComponent()
. (Side note: a compiler warning about updated data not being written back would have saved me days of debugging!)
How’s the performance of GetComponent
and SetComponent
? I know at least GetComponent
is well documented, so I don’t assume that’s a bottleneck?
Then I’m also reading component data in the inner loop, i.e. not inside Entities.ForEach
. Could this affect performance as well?
Thanks for reading if you made it this far.
If you’ve identified an issue by design, or you were courageous enough to crawl through the code and find a problem there, that’s awesome!
If you actually want to test this, clone the repo, create a new (built-in renderer) project in Unity, add the cloned repo’s package.json
as local package in Package Manager. Then, use the new Visual Pinball menu to import a table (for example this one). “B” adds a new ball when playing.
Thanks in advance!
-freezy.