Modern CPUs have amazing cache sizes e.g. 32,768 bytes for data and the same for code.
OP Codes for x86 are one or two bytes in size so you can have about a minimum of 16,384 instructions to run your game.
What if you combined your DOTS systems into one large system and instead of having lots of little systems interacting (and paying lots of overhead for management) your write just one that fits in the CPUs cache.
OK this is just your game scripts and as soon as you step into the Unity Engine code your outside of this but your core game systems could fit into the L1 cache and it could give you optimal performance.
This could be ideal for small and simple games but does the DOTS API allow for all your game data to be passed into a single system?
It looks like all of it’s filtering mechanics and pre-built systems (Translation) would get in the way of using DOTS this way.
Mostly. But you lose out on sanity, automatic dependency management, and version filtering. The latter is essential for not brute-forcing things.
My largest system is 980 lines and is a skinned mesh bindings reactive system. That’s ignoring the modified HR V2 which was a spaghetti monster before I touched it.
For me it is an academic question. Yes you can write a game that will fit into the L1 cache, but this game will probably be so simple it does not matter much a least for non mobile game. In mobile games additional benefits of efficiency are less heat and battery usage.
This is pointless question, if you want your whole game to fit into 32K, you’d probably write that in assembler anyway. Heard or 512B demos, 4 KB demos? Those old school “intros”?
xoofx (the main developer behind SharpDX and Burst) used to make those as well.
These demos were limited in code size not actual memory usage.
Here’s example:
(with slides how it was made - in video description)
Trying to fit your code and DATA into 32K, but run it in context of Unity Engine? Why?
It won’t stay in cache anyway, inside UnityEngine.dll it will jump all over the memory. You gain nothing.
Even if you fit everything into 32K, you won’t get light-speed performance.
Memory access is only “half” of the work, actual ALU/FPU instructions is the second “half”.
So unless you plan to do “nothing” with the data, you’ll be limited by ALU (which is not common).
I’d say being limited by ALU is also the goal of the DOTS and it does it pretty well.
Things like linear memory access and auto-vectorization increase ALU utilization.
To conclude:
You don’t need everything in L1 cache to have optimal performance.
It’s enough to have it in much slower RAM and start loading it in advance into cache to prevent delays. This is what CPUs are trying to do and linear memory access helps that a lot.
It’s the whole point of DOTS, performance. While DOTS goes out of it’s way to ensure alignment and cache cohesion for data every time a System runs its code also has to be loaded from memory and stored in the L1 cache.
So if you have lots of little systems being popped into and out of the L1 code cache, the DOTS systems will become the bottleneck not for data bandwidth but for opcode cache loading and unloading bandwidth.
The question is how big are DOTS systems (all that boilerplate adds up) and how many can you run before you hit opcode L1 cache limits and start waiting for L2,L3 and RAM access latency times.
Could a DOTS heavy Dotty game that uses a lot of small Systems be inherently slower than one that uses fewer larger systems?
Hi, it’s perfectly acceptable for people to ask questions, no matter how obtuse or difficult. Arowx is trying to create discussion around things like this. The real answer is not to suppress this voice but to move to general discussion as it’s not directly working with DOTS as it is today but generally theoretical. Thanks for understanding.
No, it’s moved. Theoretical discussion is a general subject. Actual discussion (questions and answers) about things belong in their relevant forums. This is why many complained this thread was spam. It is not “spam” if it is in general discussion.
I like to encourage open minded thought. There is usually a place people can do this where others can opt-in to participate rather than exclude them. That’s my way
It’s a dangerous hubris to limit rules to only what we know.
Further posts - please keep it on topic/constructive.
Yeah I was going to say the binary for an old Atari or arcade console can probably fit without emulation. Modern game engines are of course not designed for those size constraints.