I’ve been working creating a 3d voxel world and I’m wondering about how I should store information about voxels. Currently I’m using ScriptableObjects like so;
[CreateAssetMenu(fileName = "Material", menuName = "TerrainMaterial")]
public class TerrainMaterial : ScriptableObject
{
public string materialName;
public int durability;
}
Where each chunk stores a 3d array of these;
public struct TerrainVoxel
{
public TerrainMaterial terrainMaterial;
public int durabilityLeft;
public TerrainVoxel(TerrainMaterial mat)
{
terrainMaterial = mat;
durabilityLeft = mat.durability;
}
}
Given that I’m planning for a single chunk to be 3216032 voxels, is this a memory efficient method of doing this? I’d heard scriptable objects can allow objects to reference to shared memory, which should help, though I’ve seen others store a byte/int identifier and look up information based on that when needed.
Well, it’s not really memory efficient considering that on 64bit platforms (which is pretty much default nowadays) a reference takes up 8 bytes (64 bits). So your “TerrainVoxel” struct would take up 12 bytes. As you mentioned, it’s usually much better to just use an index into a static array that holds all the different “type objects” and just give each voxel an index into that array. For ease of usage you can of course add an indexer to the struct that does the lookup for you on the fly. In order to do the reverse, the easiest solution is as you add the objects to the array / List during initialization you just store the index of each object in a field inside those objects. So each of your “TerrainMaterial” objects know their own index.
Though you have to think about what information you actually need for each voxel. For example Minecraft used to have just a single byte as the block type and another byte that stored the lighting information (4 bits) and additional metadata (the other 4 bits in that byte). So a single voxel had just 2 bytes. Of course the newer versions used more bytes. I don’t know how MC actually handles the chunks in memory. However the newer file formats now store a palette / list of used blocks for each section (a section is a 16x16x16 area) and then actually use a varying amount of bits in a bit stream to store the actual block type for this section. So if only a few block types are used this can massively reduce the amount of data that needs to be stored. However the representation in memory is usually not compressed as you need quick random access. While in theory the same method would work, an issue arises when you add new blocks to a section that wasn’t previously present in that section. Though one may implement a “simple” and “complex” version of a chunk section where the simple one uses a byte to store the block type and uses a local block palette while the complex chunk uses like an int instead. Of course when you exceed the capabilities of a simple chunk section it would need to be replaced with a complex one. This would be quite a bit of bookkeeping work.
So it generally depends on what things you want to support. MC has now “evolved” to support an insane amount of block types and therefore has adjusted its used data formats several times over the years.
I would probably go with an int for the block type. Not sure about how much metadata I would pack into a voxel itself. MC has the concept of “TileEntities” which are essentially extra objects which are “attached” to a certain voxel / block. Those are stored seperately from the actual voxel data and would contain any additional state informations necessary for the block. Though MC was quite clever, so many things did not need a tile entity. Redstone dust used the 4 bits of metadata to represent the 15 power levels. Orientable blocks used some of the bits to represent the orientation. Things like doors used one of the bits to represent the open / closed state and 2 of then for the orientation and so on. Only things like chests required an actual TileEntity which stored the content.
Just as a general advice: Don’t use multidimensional arrays in C#. They are slow and add a lot of boilerplate code which in most cases is not necessary. In the end a multidimensional array is just a flattened array anyways. So just using a flattened array directly is a lot faster and in a lot cases simpler. Sticking to powers of two for the dimensions has also nice sideeffects as the indices inside a chunk / section directly map to a certain number of bits in the flattened index. So the classical 16³ sections MC uses would need exactly 4 bits for each dimension. Having chunk dimensions of 32x32 is a bit off as it requires 5 bits per dimension, though it would still by possible to use bitshifts to construct an index for the array. Though creating too large chunks / sections is usually not a good idea as you have to update the chunk / section whenever a block in that section changes.
Note that even though MC generates the world in chunks, the rendering is based on chunk sections and only those are actually generated as a mesh based on the render distance.
Thank you for all that helpful information, this gives me a lot to go on! Very glad I learned all this before I got too deep into this haha. I want to keep fairly simple data for terrain, as I’m thinking of using GameObjects as opposed to different block types for interactable structures and such.
One more question, if I use an array of ints to store the index to access voxel data, how could I store information per-block such as how many hits left to break?
You need to start with some implementing. This is just basic data design 101.
You could make the int instead be a struct with all the data you want.
Or you could keep two correlated arrays.
Or you could encode the data into high bits of the integer.
Or something else!
Whatever it is, it can be done, that’s not in question, as we have Minecraft as an example, plus five BILLION youtube videos about “how to make minecraft in game engine XXX!”