I have finished reading every page of this online book Data-Oriented Design. After, I am very tempted to redo my data design again which I started last year when I knew nothing, as they are now naive and stupid. I think it is a required reading that can’t be filled with just Unity’s Entities documentation which is how to use the library. You want to use the library to design things mentioned in this book or else you could be doing OOP inside data-oriented. (e.g. not actually thinking how the data looks like and Entity is now a synonym for an object)
Though it maybe quite intimidating to start reading it so I would like to give a rough overview on each chapter how it will solve your problems coming from OOP. Actually I typed these down along the way as a way to keep myself focused, I’m going to just paste them now.
(You can read on a bigger font at : Highly recommended read : dataorienteddesign.com/dodbook)
Chapters
-
Data-Oriented Design : Convince you why OOP is only good for human programmer.
-
Relational Databases : Proof there is a surefire way to answer “how should I model my data” if your mind is still with OOP. By borrowing techniques from DB people, each object can be untangled from its spaghetti of references and null fields into a beautiful linear array, where you can then easily model with Unity’s Entities library. There are multiple levels of normalization you can choose to do. When you linearize out the array field to a more data-oriented desigh or maybe with DynamicBuffer member, you can now properly call it “1NF”.
-
How to “de-if” your code so you can just iterate on everything without if guards or runtime checks, by data design.
-
Identify the if that really matters and essential.
-
The eternally popular demand to check for “subtype” (OOP concept) while iterating, but IComponentData is just that thing with no hierarchy. Using generics was not helping since each one became a completely new IComponentData. Baking in an enum or use by-value ISharedComponentData to characterize them to replace subclassing is one solution that may come to you by common sense, this section expands on that so you feel it is not such a bad thing to do, compared to in OOP.
-
It is not weird to performs a check first then work on a chunk “blindly” after that one check, instead of the usual OOP way of checking for each object at the very last chance. This is essentially ISharedComponentData filtering, so you can kind of switch case on any concrete value and get relevant data to work on.
-
Instead of runtime polymorphism (casting type, and there are some logic that change it behaviour depending on the current type), changing its archetype could also solve this. You will fear that data movement cost will be too much, but this article assures that that is in fact the norm in data-oriented.
-
The event system : This is almost the bane of data-oriented design that you feel there is definitely no way out elegantly at first, and you were taking it for granted in OOP. There is a subsection for this, so you know you are not alone in finding this a problem. See how it relates to existential processing, I wasn’t expecting this subsection to be here at first.
-
Finally it came to the C of ECS, previous sections wasn’t even working with “components” but instead “just a data”. So this section is very directly relatable to Entities package.
-
It has an explicit mention to Unity and its classical components (the MonoBehaviour attachable component) that it is not yet the component in data-oriented way.
-
Harms of an object that impose its definition over its containing data (facts).
-
How to deal with your Player uber-class where each things inside it sometimes talk to something inside it, and in the end you cannot take any of them out.
-
Or “HLOD”. Unexpectedly, the book takes directly on rendering, a concept specific for games and similar media.
-
It extends HLOD out of rendering concept though, you can “lod” just about anything such as simpler processing when accuracy is not needed.
-
“Mementos” is an interesting term I first found here, but it is quite interesting. When LOD make the thing go to low definition, they should preserve some states so they could get back to it when they would become high definition again.
-
Mementos deal with the occassional fear in Entities pacakge, of destroying Entity as a part of logic to make your system works the way you want to (not finding the entity anymore, therefore automatically do something/not doing things anymore, eliminating null reference problems found in OOP), but you feel that “destroy” is a bit too much though there is no better clean way, and you start hacking in a flag or a boolean, then you have an if. Adding tag component might help, but in some situation I think memento is a very clean and data-oriented way. For example adding a tag component requires modifying the system to account for it. (None-query it) You will have a clearer strategy how to get that back dynamically.
-
The author not only removed rendering aspect from HLOD, but also the distance. You will realize that it is all about reducing works, a recurring theme in the Entities package and data-oriented design. This includes reducing instances of things, though the Entities package advertises it is fine with millions of things. It is then related to state complexity from the previous section, using LOD (axis of value) maybe a more data-way to determine what they should look right now than multitude of flags and booleans. This is as simple as refering to a stack of dishes as a single thing with an int how many dishes on it. This is already a HLOD because you reduced the details. Mind blown!
-
This one also hits hard, it seems to be going against the flow in data-oriented to find something not already segmented (chunked) by components where previously you do something like LINQ query. Now you see inconveniences everywhere to allow searching : no longer parallelizable, manual work and tedious state maintenance, reduced performance, ugly code.
-
Realize that the search is faster if the thing you are checking for is not interrupted by other things in between in terms of cache line, not just the complexity of search algorithm you use. How about a linear search that is faster than binary search because it travels in straight memory instead of jumping around thanks to data-oriented design? The key is in that 64 byte cache line size vs the size of each of your object, how much free stuff you can get in one read? Big O notation in shambles.
-
Data-oriented search requires keeping in mind what would be your current cache line and get clever with it. For example on top of my head, separating a field to a new IComponentData for it to be linearly iteratable sounds good for speeding up search.
-
Get to know awesome algorithms that data-oriented design has advantage, such as Bloom filter or B-Tree.
-
Differentiate searching from sorting and hashing problems.
-
It is possible to just modify the previous search result if you are aware when things are added or removed as an optimization, so you don’t have to search again.
-
An another common need that feels wrong with Entities. You seems to be getting everything ordered randomly all the time caused by Entity removal and insertions. An unsettling experience coming from OOP world of hand-made List.
-
Think it through if there is any other way than sorting or not.
-
The same with searching, data-oriented sorting should be aware of cache line and which linear memory to work in which order.
-
Get to know data friendly algorithms such as Radix Sort or in-place Bubble Sort or Sorting network. There are severtal given to you in Entities package as well as an extension to work with NativeArray.
-
Making clear that which premature optimization is bad.
-
Premature optimization is the root of all evils because in OOP and its instance based design abstracts so much you see everything as premature, until later when they show up, then you cannot optimize anyways because OOP is getting in the way. In data-oriented you see the impact early and they looks no longer premature, therefore not evil then you can optimize it right away. Also you can fix it now, or improve them later.
-
Don’t give in to hopeful optimizations. Determine, profile, make reports. We have the Performance Testing package for this. It is great because as it take care of messy warm up step and report generation, exactly so the optimization isn’t hopeful.
-
Give you several optimization examples on common problems.
-
Several OOP optimization brought up here will hit home. You have done that before. How will that fare in data-oriented? Most of them reveal their weakness in cache line utilization.
-
One part deal with data addition and deletion combined with multithreaded code. In this part you will appreciate that the system of Unity Entities manages that for you.
-
“Don’t delete” is a good advice so other concurrent system do not have to sync. They can continue to process values that ended up unused.
-
Helping the compiler : Needless to say, we are glad that most of this chapter are solved by C# Jobs limitations, Entities API design, and Burst, that we would otherwise have to face if coded pure in C++. But still useful to know about them.
-
One of the bigger reason against using data-oriented approach that you can no longer subclass and infinitely build on the previous thing. This chapter make you feel better that there is more dimension in extending that is not inheritance.
-
Why data-oriented cause less bug has been covered many times already but again here. The Entity query thing is already a protection against null referencing in OOP since the work didn’t even start if there is nothing to work for.
-
Convince you and remind you of your past OOP terror why data-oriented could make debugging faster. When he said a chain of if with multiple hidden return in them, I immediately could relate to that one method I spent a day on creating and debugging. It wouldn’t have to exist if it was data-oriented.
-
Different view on the word “reuse”. You can indeed carry over something in a data-oriented project forward.
-
Unit testing is hard in OOP because of setup step and that make us lazy to do it, you have to wire up objects. When it’s just data, it is clear what to setup and what to look for : it’s data. No more instantiating GameObject and placing in an imaginary world we can’t even see because it is an Edit Mode test.
-
This is actually a good first chapter to read. Answer questions as why you should drop OOP mindset.
-
It reminds you about the current trend in OOP that is only good for you but bad for the machine and your players, and why data-oriented solve them.
-
Explain why virtual calls that came as an ability in OOP are much more harmful (to data in your cache) than you think. Most devs thought that there is no way that the work in there is not worth the “tiny” virtual call cost, and that’s what’s wrong. As he talk how to avoid this problem, it ends with how about stop doing OOP altogether.
-
Abstractions can help you solve problems, but in game development the problem is performance and that abstraction is of the wrong kind.
-
The chapter can make you less carved for inheritance addiction.
-
A more realistic view of “reusing” provided by OOP. Generic code is not as good as it sounds, it is only good on paper.
