Loop count with letters

Little context to my problem.

I was using string as a kay to uniquely identify one of many hundred thousands of objects.
But for reasons unnecessary to mention, I had to drop them.
I wanted to use Int32, however I found out I might run out of incremental ID. Not because of amount of objects in scene, but because of how many unique items are spawned and destroyed.
I wanted to use Int64, but then I found out it would be insanely costly memory wise with massive Dictionaries.

So my question is whether there is an already beaten path on counting using numbers, small caps and big caps letters.

0,
1,
2,
3,
…
9,
a,
b,
c,
…
A,
B,
C,
D,
E,
F,
G,
…
Z,
00.
01,
02,
…
0a,
0b,
0c,
…
Ac,
…
Bd,
ZZ,
000

This would make ID significantly shorter as well as spare memory.
I can fit many more IDs in this format than integer.
Not to mention integer is limited, whereas string can relatively go on forever.

I need something that would be painful for performance of my game. I wanted to search for it, but I don’t even know what to search for because I don’t know how to call this thing.

I’ve managed to find a single something about C++ here: c++ - How to Loop program that counts, digits and Letters? - Stack Overflow
But that’s not a thing I can use with Unity. There are a couple Python and Matlabs, but I can’t program in either. I tried to find C# equivalents but I got questions about parsing (and different looping of) the alphabet instead.

This is going about the problem in an absurdly convoluted way. Stick with numbers, which you can just ++ and be done with it.

Int32 goes up to a couple of billion, far larger than “many hundreds of thousands”.

If you need more, long goes up to 9,223,372,036,854,775,808.

If you need more… frankly, you’re wrong. If you spawned a million objects every frame at 60fps and ran the app constantly for 24 hours, you’d reach 5,184,000,000,000. You’d have to run this for about 5,000 years, constantly, always running at 60fps, before running out of unique identifiers.

2 Likes

Are there any memory/performance drawbacks to long?

Is there no more intense and complicated math going on under .NET framework’s hood or Unity’s hood that would choke down my application when being read/written/sought by/indexed?

long is a 64-bit integer, so as long as you’re running this code on a 64-bit device (which is just about all of them today), its performance ought to be identical to int in terms of CPU cycles (bearing in mind I haven’t profiled this or anything), and string comparison is notoriously slow compared to int comparison (as it’s basically a series of int comparisons). In terms of memory, longs (8 bytes) will of course take twice as much space as int (4 bytes), but strings take 20+(n/2)*4 bytes according to StackOverflow.

Is there a general knowledge on how much will a Dictionary size 1 million with long as it’s key and null as a value weigh? Is there a general way to predict that? I would like to personally compare all methods to every other method and see it on my very own eyes.

Where n is length of the string?

Strings will always take more space than numbers because strings are just encoded into bytes, same as numbers, but with extra considerations.

Each character of a string will use either 8, 16 or 32 bits, depending what kind of encoding you use. If we assume UTF-8, then using characters 0-9, a-z and A-Z give you 62 possible values for each 8 bits of memory. But just sticking with integers gives you 255 possible values for each 8 bits of memory.

If you included all of the extra characters in ASCII, then you could achieve the same efficiency as just using integers, but not better. Moreover, using all the characters in ASCII would be even more mindboggling. How will you represent the non-printing characters?

So, as StarManta said,

The memory footprint of a Dictionary will be primarily from the keys and the association. Each association should be either 32bits on a 32bit architecture, or 64bits on a 64bit architecture. If you chose a 64bit integer as the key, then you’re looking at n * 128bits.

A million entries = 128 million bits, approximately 16MB.
That’s going to be a drop in the bucket compared to how much memory the rest of your game will need.

Now for the important bit that everyone ignores: It’s important not to dwell on these kinds of performance considerations early on. Get things running first. Find and fix as many bugs as you can. Then, and only then, use performance tuning tools, like the profiler, to see if you can make things tighter.

You do know that Unity uses an int as the id key for all of its objects in the game. Ala ‘GetInstanceId’ which returns an int. So in theory, if you’ve created more than 2^32 objects in unity, you’ve consumed Unity’s own instance id space.

Furthermore, a string will take up more memory than a long/int64. An long/int64 is 8 bytes in size no matter the value. 0 is 8 bytes, 4890432294032 is 8 bytes. A string on the other hand is roughly 20 bytes + 2 * length. Basically your smallest string will be 20 bytes and grows from there. That means even the smallest of strings (empty?) will be 12 bytes larger than a long/int64.

As for the memory consumption of a Dictionary with key type of long. Well that depends on a handful of things… what is your key type, your value type, and what runtime you’re running. I believe the version in 2019.x has 4 arrays in it of length prime > capacity. The 4 arrays are of type int, Link, TKey, and TValue (where Link is a struct of 2 ints, so 8 bytes about). So given TKey and TValue are only 4 bytes each we’re talking ~20 bytes * size of dictionary (with extra padding). If TKey or TValue are class/ref types you’ll also need to consider the size of those objects as well on top of this.

So in the case of a string, since it’s a class/ref type. It’s toing to be the 20 bytes * size + the memory taken up by all the strings. Unlike long/int64 which can be stored directly in the TKey array.

Furthermore, if memory is a concern. Don’t use a dictionary. If you are using something like int/longs you could probably come up with a more compact collections like a binary tree or something that has fast lookups like dictionary (though maybe not as fast depending size). But I wouldn’t be able to give a best use case if I don’t know what it is your actually trying to store and how you intend to use it.

TLDR;

int SHOULD be enough

if it’s not use a long

string is dumb

There is NO situation in which using strings to represent natural numbers will save memory or improve performance compared to using integers. None. ints are optimized for representing integers; using another data type (like string) that is NOT optimized for that purpose is just never going to be more efficient.

In the unlikely event that you ever find a case where you really do need integers bigger than int64, C# has a special built-in type for representing arbitrarily-large integers, where the only limit to how high you can count will be your computer’s memory.

But I’ve been programming for decades and I can count on one hand all the times I’ve ever wanted something like that. As others have already pointed out, if int32 isn’t enough for indexing your game objects, your plan has bigger problems.

3 Likes

I’ll add one more version of that: dwelling on this early on will make your codebase a nightmare.

Respectfully disagree sir. I’d way rather have a debug log full of semi-meaningful strings than just random numbers.

In the use case OP posted, I would do something like “semantic string” + staticIncrementor.ToString()

1 Like

Mind you I’m not saying “string is dumb” in the context of always. I’m saying it in the context of “saving memory”.

You did quote a snarky TLDR at the end of a long post after all.

Cause at the end of the day “A” is just as meaningless as 36 (which is what OP was hoping to replace 36 with).

Fair enough, I think I interpreted your assertion over-broadly. My bad.

In other news, I am trying to imagine the sheer scale of what OP was contemplating actually overflowing 2^31 in any reasonable game context, and not having 57000 other problems related to large numbers of things, whatever those things are.

1 Like

Agreed, I still wonder what they hope to be doing.

qcw27710 has posted several threads asking for help doing complicated and unusual things where the recommended answer turns out to be “You really shouldn’t be doing that in the first place. Take a step backward and approach your original problem from a different angle.”

Thumbs-up for trying to find creative solutions to their own problems, but in this case I think the creativity-to-experience ratio is problematically high and they would be better served by more incremental learning.

2 Likes

I salute qcw27710 for sticking with it and urge qcw27710 to be on the lookout for “gee this seems unusually tough to do” situations and use that more as a signal to investigate further, which is pretty much exactly what they did with this post. Bravo! Unity and game development can quick get beastly complicated and abstract!

2 Likes

Note that strings in C# are stored in memory using UTF-16 encoding, which means they are encoded at 2 bytes per character. Quite inefficient for common Latin or numeric characters (UTF-8 uses half the memory for common Latin characters). And that is before all the previously mentioned overhead of being a string in the first place. So ZZZZ would be 8 bytes + overhead. For 8 bytes you can represent a considerably larger number using a ulong.

Personally, I think this is quite an easy mistake to make. I made it myself when I was learning. The trap is that, as humans, we see the characters of a string the same way we see the characters of a number. Each character of a string and each digit of a number take one “slot”. The string slots can hold numbers or letters, so that seems a more effecient use of the space, which must mean more efficient use of memory.

Of course, that’s not how computers represent numbers and strings. But without knowing how strings and numbers are encoded to memory, it’s an easy misconception.

Since encoding has been so well abstracted by our tools, this knowledge risen to the level of mid-level programmers, well above the typical hobbyist or beginner.

Considering a variable that would easily and safely fit in Int32, why would I use Int32 over Int64? You read that right. Post somewhere before here mentioned (more or less guessed) it would be just 16MB with 1 million Dictionary entries (if value is null), what would I miss out on?

I could change type of “a specific number” of variables that might need it, and set them to long. What would be a reason for me to go from Int64 to Int32 integer if it won’t fit in Int16? Memory doesn’t seem to be a problem, performance-wise it will be better than what I did before (which is stupid string comparison).

Why not just spam Int64 for sizes greater than Int16?

Bonus question: Looking the way and .NET manage numbers, do I need to optimize something down to Int16, and Int8 instead of generously using Int32 (int)?

Int32 is most commonly used, so many frameworks and utilities only provide methods which accept int32 parameters. Using an int64 where unnecessary is not really about memory consumption, though it could become a consideration when you get to the millions of entries. It’s about convenience, utility, reusability, design, etc…

This may sound stupid, but could you expand it?

Convenience, for me, on the flat terrain it’s all about size limitation. Int64 has the size I need. Int32 mostly.
Utility, I have no idea.
Reusability, I also have no idea.
Design, I still have no idea.

How would I understand utility, reusability and design in context of Int32 vs. Int64?
Or is this a topic so massive, that it’s inconvenient for a message board?