So say I had a decent sized 2D array, like 2048x2048. If only filled 2000 elements of it would the other elements still be taking up a large chunk of space?
I am using a 2D array because it makes it easy to look of the value of say position (1356, 2000), but I won’t be filling the whole thing.
I believe the array itself will take up 32 bits per “slot” on an x86 machine and 64 bits on an x64 machine. 2048 by 2048 would be 4Mb if each slot was 1 bit, so for a 32 bit architecture, you’re looking at 128 Mb or 16 MB of memory just for the empty array.
It’s that a lot? I think it depends on what device you are targeting and how it compares relative to the 2000 “items” that will actually be created in the heap.
What is this array holding?
Instead of using a huge 2d array, perhaps you could override the indexing operator and create the convenience access logically while using a smaller array behind the scenes. Or perhaps you could use a dictionary. Can you provide any more details oh what precisely you would like to achieve?
It is for a procedural generated world. There are 3 sizes of cells, which will be layered on top of each other to get the correct tile data:
2048x2048
512x512
64x64
Only the 64x64 cell will be filled (the most detailed cell). The rest will have various elements scattered throughout them, so there is no telling exactly how much space is needed. The only thing know is the given sizes. Each element is a class that contains 4 bytes.
My first inclination would be to use a custom collection. Perhaps a dictionary underneath. You could use a IntVector2 as a key. But on the outside you could just access it with a normal index.
Untested pseudo code. I’ve probably forgotten something important. But you get the idea.
public class MyCollection {
Dictionary <IntVector2, DataObject> myData;
public DataObject this [int x, int y] {
get {
if (!myData.ContainsKey( new IntVector2 (x,y))) return null;
return myData [new IntVector2 (x,y)];
}
set {
myData.Add(new IntVector2 (x, y), value);
}
}
}
public struct IntVector2 {
public int x;
public int y;
public IntVector2 (int x, int y) {
this.x = x;
this.y = y;
}
}
If you’re doing nothing but holding int data or something, you can consume a lot less memory by making it a smaller type. A byte array, for instance, uses 1/4th the space of a normal int array, but you’re limited to 0-255 as values. Small-palette images like gifs tend to use this (256 unique colors defined, which colors don’t matter- all pixels made up of those 256 colors).
In this specific case, I’m thinking that access time is probably a bigger concern than size though (a 1000x1000 multidimensional array is 1,000,000 objects to iterate through), which means you should probably break your data into larger sets like quadrants (google octrees for more information on super-efficient methods to do this). Even with just simple quadrants though, the first check automatically cuts the amount of possibilities down by 3/4ths- that’s an absolutely massive difference.
Since this matrix represents a world, I assume it will be populated by exploring adjacent cells. Data trees are generally faster when sorting or searching through sorted data but I don’t think it will be faster for accessing an unknown value at a known position. However, the tree approach can save a lot of space because you won’t be allocating a lot more memory than you need.
If trees aren’t your thing, you can think of this as a fractal grid. If the player steps off the original 32×32 grid, you can “step up one level” and have a new 32×32 grid which actually represents 1024 32×32 grids. Each time the player reaches the end of the world, you can add another level, increasing the potential size of the world by 1024 times without a significant memory hit. Here’s a diagram showing how this might look for a 4×4 grid.
The player starts in a 4×4 world but travels north, across the edge of the grid. Two things happen:
a new 4×4 grid is created for the player to exist in
a new 4×4 grid is created to hold the other two together
Each time the player exits the allocated world, we step up one level and create a larger grid which contains smaller ones. In this way, we should be able to model an arbitrarily large world. In the diagram, blue squares represent tiles, while each non-blue square represents a single empty memory pointer. You can see how there are fewer memory pointers this way.
Despite the grid-like appearance, you would still implement this as a tree. The same diagram in tree format would look like this.
However, since this world is generated procedurally, I’m not sure you need this kind of data structure at all. Can’t you just send the positions to a “procedure” and get the type output? Is this a really expensive operation so you are trying to cache?
Thanks for all of the information! I may use something similar to that grid/tree system to avoid overlapping tiles. And while the 64x64 blocks could be generated on they fly (not a very intense process) the 512x512 blocks contain cities, which is a more intensive process. Also I will probably be using dictionaries then, if it will save me on RAM.
It’s strange though. Let’s say I have 4 2048x2048 cells. Each tile in the cell takes up four bytes. that is
4(2048x2048) x 4 = 4 x 4,194,304 x 4 = 67,108,864 bytes. That is about 67.11MB of information. Yet it shot my memory usage way up to 1GB+ from its resting 340MB.
I’ve never done procedural generation so I could be way off base here but…
I would think you would rather keep a small “floating” matrix of tiles in memory for rendering purposes but once the player got too far away, the game would throw out the data and just regenerate it again later if necessary. That means you would need a matrix just large enough for the draw distance. When a player moved, tiles too far behind them would drop off the matrix and be forgotten and new tiles in front would be generated.
Sounds pretty odd. Is this done with arrays or a List? Maybe show some code that reproduces this.
Well what happens is we store that data for a large grid while the player is in the area, because we need to know the boundaries of that big cell they are in, however we only render what is just around them. The level of detail is greater the smaller the cell, which is why the smallest cells are the only ones actually rendered.
And it is odd, I will get some code posted up:
Tile class (The 4 byte data structure):
public class Tile
{
public Tile()
{
id = 0;
sid = 0;
trot = 0;
mrot = 0;
}
public Tile(byte _id, byte _sid, byte _trot, byte _mrot)
{
id = _id;
sid = _sid;
trot = _trot;
mrot = _mrot;
}
public byte id;
public byte sid;
public byte trot;
public byte mrot;
}
The cell holding it:
public class Cell2048
{
public Tile[,] tiles = new Tile[2048,2048];
public void Generate(int x, int y)
{
for (int i=0; i<2048; i+=1)
{
for (int ii=0; ii<2048; ii+=1)
{
tiles[i,ii] = new Tile(1, 0, 0, 0);
}
}
}
}
Also, interestingly enough, when I switched over to a dictionary I ended up using WAY more memory. I took 4 of my 1024x1024 cells and filled them each half-way. Using a 1024x1024 array in each I got a total of around 230MB used and 430.8MB reserved in the profiler. With a dictionary I got 430-450MB used and .6GB reserved.
Creating an array allocates the amount of memory needed for the entirity of that array. Each element will take up the amount of space needed for the value put in there.
This means reference types will take up the size of its reference pointer value (32-bits/64-bits like eisenpony pointed out earlier).
If it’s a struct, each element will take up what is needed for that struct/prim. I’m not sure about Mono, but .Net will usually pack them nicely if you format your struct correctly. For example a byte array will take up 1 byte per element, rather than the usual 4. See StructLayoutAttribute for more info on proper struct formatting when you need to make them compact.
In the case of what you want. Arrays are NOT sufficient for what you want to do. Linked collections are what you use for this. Such as:
The reason these are useful is that each element takes up a smidge more space in memory. BUT your collection is only made up of the number of elements that you need.
It also comes with much faster inserts, resizing, and the sort. Usually at an increase in read cost (retrieving a value at some position usually requires enumerating over the collection). Usually how you deal with this is desing your algorithm to not need to access specific positions/indexes as much, and instead reference objects and find neighbours based on the collection’s structure.
You can of course design more complex linked structures that aren’t as one dimensional as the ones supplied by Mono/.Net. They merely exist as simple implementations for reuse.
It is so strange, but so far all of these solutions are taking much more memory than the arrays with empty elements. That is taking about 121.1MB, with the second most efficient being the dictionaries with 360MB.
Here was my dictionary implementation:
public abstract class Cell
{
public abstract int size { get; }
private Dictionary<IntVector2, Tile> tiles = new Dictionary<IntVector2, Tile>();
public void SetTile(int x, int y, Tile tile)
{
if (x >= 0 && y >= 0 && x < size && y < size) //if we are within the cell's limits
{
if (!tiles.ContainsKey(new IntVector2(x, y))) //doesn't already contain a tile at that location
{
tiles.Add(new IntVector2(x, y), tile);
}
else //need to override the old one
{
tiles[new IntVector2(x, y)] = tile;
}
}
}
public Tile GetTile(int x, int y)
{
if (tiles.ContainsKey(new IntVector2(x, y))) //contains a tile at that location
{
return tiles[new IntVector2(x, y)];
}
else //no tile at that location
{
return new Tile(0, 0, 0, 0);
}
}
}
That is the base class for all the different sized cells. The only code in the cell classes is overriding the size getter to whatever size the cell is.
Dictionary’s are ordered in arrays of prime number size. And use link lists as buckets to store colliding hashed entries.
Worse that linked list for the bucket in the array stores the hash (key), the value stored, and the next node in the list.
So yeah, a Dictionary isn’t going to save you much space, especially if the hash for each entry is simplistic and collides a lot.
Also, I bet you didn’t try the linkedlist, sortedset, or a custom binary tree, or other linked collection… because there is NO way it would use more memory than a 2048x2048 array.
Basically… stop trying to sovle your problem as an array. And solve it as a set.
Do you organize the objects in your house in an array? I don’t think so… the coffee table is by the couch, across from the tv, in the living room, just down the hall from the kitchen and the front door.
First time I’ll have to disagree with you @lordofduct . Just watch out for those pesky IndexOutOfLivingRoomException and DomesticArgumentException…
Still not sure why you need to record these values at all. I would think the procedural generation would return the same value every time you asked for a specific position. Every time you call this sample implementation of GetTileAt using the same coordinates, you will end up with the same result.
int salt;
public void Initialize(int seed)
{
Random.seed = seed;
salt = Random.value * Int32.MaxValue;
}
public Tile GetTileAt(int x, int y)
{
Random.seed = (x << 16 ^ salt << 16) ^ (y ^ salt >> 16);
return GetTileFor(Random.value);
}
private Tile GetTileFor(float randomValue)
{
if (randomValue < .25)
return new RiverTile();
if (randomValue < .5)
return new MountainTile();
return new BoringTile();
}
So, you could have two or more of these implementations. One is a high level decision maker. It decides what kind of terrain should be found in this general area. It returns a new TileFactory class which can produce those kinds of terrain tiles. This TileFactory class will have a similar implementation and will return random tiles of the correct terrain.
How about “the coffee table is in the living room” and leave it at that? I thought this question was answered pretty well by Lysander, so I didn’t see a point in belaboring it further. Use a quadtree. It’s one of the best data structures for 2D sparse spatial relations, there are writeups all over the place (eisenpony wrote a good one up above), and there’s existing, tested code using list structures like lordofduct suggests. This is problem was solved decades ago, and I only say that to reassure you that there’s a proven solution out there with plenty of example code and explanations that you can give you a head start.
I imagine that some of the data is dynamic, such as mobs moving around the world.
And a quadtree is a linked collection where every node has 4 links on it.
I agree with Lysander’s solution.
OP came back asking about how you’d access something at some index. I was pointing out that you don’t need to reference by index, and instead by relational location. Coffer table in the living room is a relational location, just like my next to the couch example.