Could you fit your entire game into the L1 cache?

Arowx · November 9, 2021, 3:05pm

Modern CPUs have amazing cache sizes e.g. 32,768 bytes for data and the same for code.

OP Codes for x86 are one or two bytes in size so you can have about a minimum of 16,384 instructions to run your game.

What if you combined your DOTS systems into one large system and instead of having lots of little systems interacting (and paying lots of overhead for management) your write just one that fits in the CPUs cache.

OK this is just your game scripts and as soon as you step into the Unity Engine code your outside of this but your core game systems could fit into the L1 cache and it could give you optimal performance.

This could be ideal for small and simple games but does the DOTS API allow for all your game data to be passed into a single system?

It looks like all of it’s filtering mechanics and pre-built systems (Translation) would get in the way of using DOTS this way.

Is anyone using DOTS with large code systems?

Timboc · November 9, 2021, 5:16pm

I couldn’t fit my entire game into the L1 cache, no.
Feel free to mark this post as answered

DreamingImLatios · November 9, 2021, 5:36pm

Mostly. But you lose out on sanity, automatic dependency management, and version filtering. The latter is essential for not brute-forcing things.

My largest system is 980 lines and is a skinned mesh bindings reactive system. That’s ignoring the modified HR V2 which was a spaghetti monster before I touched it.

Arowx · November 9, 2021, 6:29pm

Just for reference in the 1980s entire games were written to fit in less memory than our current L1 caches, including graphics.

So it can be done.

full size → Commodore_Game_Ads_3.jpg (5200×5400) (telparia.com)

Micz84 · November 9, 2021, 7:14pm

For me it is an academic question. Yes you can write a game that will fit into the L1 cache, but this game will probably be so simple it does not matter much a least for non mobile game. In mobile games additional benefits of efficiency are less heat and battery usage.

Arowx · November 9, 2021, 7:20pm

Probably a better question is how large is the assembled opcode of your core game, and how can we check that out in Unity?

Burst Compiler, IL2CPP IL assemblies, exe size, build console log?

PS: and remember you could keep your game on chip in L2 (0.5 Mb) or L3 (4-8Mb) giving you way more space.

OndrejP · November 9, 2021, 9:46pm

This is pointless question, if you want your whole game to fit into 32K, you’d probably write that in assembler anyway. Heard or 512B demos, 4 KB demos? Those old school “intros”?
xoofx (the main developer behind SharpDX and Burst) used to make those as well.
These demos were limited in code size not actual memory usage.

Here’s example:
(with slides how it was made - in video description)

Trying to fit your code and DATA into 32K, but run it in context of Unity Engine? Why?
It won’t stay in cache anyway, inside UnityEngine.dll it will jump all over the memory. You gain nothing.
Even if you fit everything into 32K, you won’t get light-speed performance.
Memory access is only “half” of the work, actual ALU/FPU instructions is the second “half”.

So unless you plan to do “nothing” with the data, you’ll be limited by ALU (which is not common).

I’d say being limited by ALU is also the goal of the DOTS and it does it pretty well.
Things like linear memory access and auto-vectorization increase ALU utilization.

To conclude:
You don’t need everything in L1 cache to have optimal performance.
It’s enough to have it in much slower RAM and start loading it in advance into cache to prevent delays. This is what CPUs are trying to do and linear memory access helps that a lot.

Arowx · November 10, 2021, 3:27am

OndrejP:

This is pointless question, if you want your whole game to fit into 32K, you’d probably write that in assembler anyway. Heard or 512B demos, 4 KB demos? Those old school “intros”?
xoofx (the main developer behind SharpDX and Burst) used to make those as well.
These demos were limited in code size not actual memory usage.

Here’s example:
(with slides how it was made - in video description)

https://www.youtube.com/watch?v=jB0vBmiTr6o

Trying to fit your code and DATA into 32K, but run it in context of Unity Engine? Why?
It won’t stay in cache anyway, inside UnityEngine.dll it will jump all over the memory. You gain nothing.
Even if you fit everything into 32K, you won’t get light-speed performance.
Memory access is only “half” of the work, actual ALU/FPU instructions is the second “half”.

So unless you plan to do “nothing” with the data, you’ll be limited by ALU (which is not common).

I’d say being limited by ALU is also the goal of the DOTS and it does it pretty well.
Things like linear memory access and auto-vectorization increase ALU utilization.

To conclude:
You don’t need everything in L1 cache to have optimal performance.
It enough to have it in much slower RAM and start loading it in advance into cache to prevent delays. This is what CPUs are trying to do and linear memory access helps that a lot.

What about Project Tiny?

unity-freestyle · November 10, 2021, 3:49am

What is the point dude… And even if it does fit in the cache, so what?

Arowx · November 10, 2021, 10:05am

It’s the whole point of DOTS, performance. While DOTS goes out of it’s way to ensure alignment and cache cohesion for data every time a System runs its code also has to be loaded from memory and stored in the L1 cache.

So if you have lots of little systems being popped into and out of the L1 code cache, the DOTS systems will become the bottleneck not for data bandwidth but for opcode cache loading and unloading bandwidth.

The question is how big are DOTS systems (all that boilerplate adds up) and how many can you run before you hit opcode L1 cache limits and start waiting for L2,L3 and RAM access latency times.

Could a DOTS heavy Dotty game that uses a lot of small Systems be inherently slower than one that uses fewer larger systems?

hippocoder · November 10, 2021, 1:14pm

Hi, it’s perfectly acceptable for people to ask questions, no matter how obtuse or difficult. Arowx is trying to create discussion around things like this. The real answer is not to suppress this voice but to move to general discussion as it’s not directly working with DOTS as it is today but generally theoretical. Thanks for understanding.

Also if you don’t like a post, don’t reply.

Arowx · November 10, 2021, 1:45pm

Could you leave this one in DOTS as I’m going to try and Benchmark DOTS with lots of small systems vs fewer larger ones… (could be a while).

My theory is that we should see a similar performance stepping graph as we do with benchmarks that show performance vs data size.

hippocoder · November 10, 2021, 2:12pm

No, it’s moved. Theoretical discussion is a general subject. Actual discussion (questions and answers) about things belong in their relevant forums. This is why many complained this thread was spam. It is not “spam” if it is in general discussion.

frosted · November 10, 2021, 3:02pm

@hippocoder let’s be honest - we should have a "shower thoughts with @Arowx " weekly thread sticky.

hippocoder · November 10, 2021, 3:09pm

I like to encourage open minded thought. There is usually a place people can do this where others can opt-in to participate rather than exclude them. That’s my way

It’s a dangerous hubris to limit rules to only what we know.

Further posts - please keep it on topic/constructive.

GimmyDev · November 10, 2021, 3:10pm

NOT with unity

Joe-Censored · November 10, 2021, 6:27pm

Yeah I was going to say the binary for an old Atari or arcade console can probably fit without emulation. Modern game engines are of course not designed for those size constraints.

Antypodish · November 11, 2021, 9:10am

Here is some more practical approach to squeeze gme in small memory space, than theories.

~~I think my embedding broke?~~
Can you fit a whole game into a QR code?
~~https://youtube.com/watch?v=ExwqNreocpg~~

hippocoder · November 11, 2021, 11:38am

Or maybe in DNA. Who develops the game developer? hmmmm

Antypodish · November 11, 2021, 12:59pm

This isn’t that far fetched at all. Since there are researches, which work on storing data in DNA form.

Random article on DNA data.

Using DNA As a Memory Drive
**

[quote]**
**[/quote]
**

But I don’t expect anytime soon, that DNA will sit on L1 cache of CPU:smile:

Topic		Replies	Views
What impact will memory centric CPUs have on DOTS future performance? Unity Engine Entities , com_unity_entities	16	2212	September 10, 2021
AMD 5800X3D Cache and ECS/DOTS performance? Unity Engine Entities , Performance , com_unity_entities	13	3310	January 10, 2022
For super fast processing of data is there any way to ensure data is stored in cache? Unity Engine Entities , com_unity_entities	42	4916	October 8, 2022
My uneducated opinion on this Unity Engine Entities , com_unity_entities	30	3540	September 12, 2019
Could DOTS lead programmers to brute force problems that could have more optimal solutions? Unity Engine Entities , com_unity_entities	59	6052	November 16, 2021

Could you fit your entire game into the L1 cache?

Related topics