Is string's GetHashCode() always return same value on different sessions in iOS and Android?

xodennisxo · January 22, 2024, 8:13am

If I built a iOS and Android app with the same Unity version. the string’s GetHashCode() can be reliable to be the same on these platforms in every session?
In other words, I want to use Dictionary<string, xxx>, and traverse it: foreach(the dic), is the order guaranteed?

Aurimas-Cernius · January 22, 2024, 8:37am

If you care about ordering, use the specialized collections that guarantee it, such as this:

xodennisxo · January 22, 2024, 9:16am

Thank you for your reply:)
However, firstly, System.Collections.Specialized.OrderedDictionary doesn’t support non-generic type, which gonna involve box/unbox. Secondly, OrderedDictionary is slower than common Dictionary if i need to call Remove() frequently, because OrderedDictionary need remove the entry in its internal array, which is O(n) (common Dictionary is O(1), since it’s hashtable).

For my requirement, i don’t need the traverse order of the dictionary to keep its order as adding order.(like a List). what i need is the traverse order in runtime in different mobile devices keeps the same, which I think requires the GetHashCode always return the same value on these mobile devices.

GetHashCode of Object uses memory address, and GetHashCode of Unity.Object uses m_instanceId. so I definitely cannot assume the same order for Dictionary<Object, xxx>, Dictaionray<Monobehaviur, xxx>.

But i guess, the traverse order of Dictionary<int, xxx> should always the same, since the GetHashCode of a int should always just return its int value.
And i would like to ask for string‘s GetHashCode, if the codes for iOS and android are always the same. this is what i saw on VisualStudio PC:

Bunny83 · January 22, 2024, 10:44am

You should not rely on any particular implementation of GetHashCode and not rely on any particular traverse order of Dictionary as those are implementation details which are not guaranteed to stay the same from version to version or device to device.

You said you need to add / remove elements “frequently”. However how many elements that are in the dictionary at the same time are we talking about? O(1) can be misleading because for “small” number of elements just using a List can actually be faster. O(1) just means constant time. It does not tell you anything about the actual cost, just how the cost scales.

Why exactly is the consistency of the order that important to you? Any insight?

CodeRonnie · January 22, 2024, 6:21pm

Short answer: No.

I wouldn’t ever be able to sleep comfortably with the assumption that string hash code implementations were deterministic across all environments where my app might run. However, even if I had a 100% guarantee that strings always evaluated to the same hash code, Dictionary will not keep them in a deterministic order anyway.

Dictionaries use two internal arrays, an Entry array and an int array called buckets. Each Entry is a struct with the key, the hashcode of the key, the value, and an int pointing to the next entry in the entries array. Each entry can point to any other index in the array of entries. Originally, each entry just points to the next, but these can get totally scrambled up as you use the dictionary. The entries with nothing assigned are like a linked list of empty entries, and this is how the entries array is initially set up.

As you add entries to the dictionary, the hash code of the key is run through the modulus operator to get an index that falls within the capacity of the dictionary. That index does not point into the entries array. It points into the buckets array of int indices. The bucket array points to the tail of the internal linked list in the entries array for objects with that modulated hash code. If it goes through that mini linked list of similarly hashed objects and none of them are truly equal objects, then it will add the new object to a spot in the entries array from the tail of the linked list of free spaces. Then it will tell that bucket to point to this new object, and point the new object to the last simlarly hashed object that the bucket was previously pointed to, which could just be -1 if that modulated hash bucket is empty. And there could be so many of these linked lists interwoven throughout the entries array, along with the linked list of all the empty elements. As keys are added and removed from the dictionary in any potential order, the way that each entry points to the next can get completely scrambled from just the next element in the entries array. The keys hash into the buckets array, which is basically just a linked list pointer to wherever the tail of similarly hashed objects currently is in the elements array. Each element can link to another, anywhere in the array. So, each “bucket”, and the list of empty spots could be all interwoven in any random order through the Entry array based on how you add and remove things.

It’s a bit complicated to wrap your head around when you’re reading through the source code, but it’s a fun and rewarding exercise. Suffice it to say, no the foreach enumerator of key-value-pairs has no guarantee to be in the same order. It simply starts from index 0 of the Entry array, and lists them in order if there is actually an object stored there. Because the linked list of empty spots can get scrambled up by randomly adding keys, and removing old keys, and new entries could end up in any randomly freed up spot in the Entry array, regardless of hashcode or bucket, you have absolutely no way of knowing what the order will be when they are enumerated.

xodennisxo · January 23, 2024, 2:49am

The reason that I need the consistency of the order of dictionary is, I’m using deterministic calculation - “lockstep” for pvp synchronization. The key/value for adding and removing to the Dic are guaranteed in my framework, and the order for adding/removing them are guarantee too. I mean, if i simply change Dictionary to List, there will be no problem.
However, i have Some Add and Remove, Many GetValue and Traverse for the container. From the performance aspect, Dictionary is better than List here.

xodennisxo · January 23, 2024, 3:06am

CodeRonnie:

Short answer: No.

I wouldn’t ever be able to sleep comfortably with the assumption that string hash code implementations were deterministic across all environments where my app might run. However, even if I had a 100% guarantee that strings always evaluated to the same hash code, Dictionary will not keep them in a deterministic order anyway.

Dictionaries use two internal arrays, an Entry array and an int array called buckets. Each Entry is a struct with the key, the hashcode of the key, the value, and an int pointing to the next entry in the entries array. Each entry can point to any other index in the array of entries. Originally, each entry just points to the next, but these can get totally scrambled up as you use the dictionary. The entries with nothing assigned are like a linked list of empty entries, and this is how the entries array is initially set up.

As you add entries to the dictionary, the hash code of the key is run through the modulus operator to get an index that falls within the capacity of the dictionary. That index does not point into the entries array. It points into the buckets array of int indices. The bucket array points to the tail of the internal linked list in the entries array for objects with that modulated hash code. If it goes through that mini linked list of similarly hashed objects and none of them are truly equal objects, then it will add the new object to a spot in the entries array from the tail of the linked list of free spaces. Then it will tell that bucket to point to this new object, and point the new object to the last simlarly hashed object that the bucket was previously pointed to, which could just be -1 if that modulated hash bucket is empty. And there could be so many of these linked lists interwoven throughout the entries array, along with the linked list of all the empty elements. As keys are added and removed from the dictionary in any potential order, the way that each entry points to the next can get completely scrambled from just the next element in the entries array. The keys hash into the buckets array, which is basically just a linked list pointer to wherever the tail of similarly hashed objects currently is in the elements array. Each element can link to another, anywhere in the array. So, each “bucket”, and the list of empty spots could be all interwoven in any random order through the Entry array based on how you add and remove things.

It’s a bit complicated to wrap your head around when you’re reading through the source code, but it’s a fun and rewarding exercise. Suffice it to say, no the foreach enumerator of key-value-pairs has no guarantee to be in the same order. It simply starts from index 0 of the Entry array, and lists them in order if there is actually an object stored there. Because the linked list of empty spots can get scrambled up by randomly adding keys, and removing old keys, and new entries could end up in any randomly freed up spot in the Entry array, regardless of hashcode or bucket, you have absolutely no way of knowing what the order will be when they are enumerated.

Thank you very much for your explanation!
However, I still would like to ask, which part of the algorithm you mentions involves Random? I mean, if I have a static order for adding/removing the same key/values into a dictionary for many times tries, which part is non-deterministic?
I know we never no the final order of the foreach, the order is disordered, however, if the GetHash is deterministic, is the order deterministic too?

To compile these code in certain unity version, run on different platforms, are the foreach order different?
var dic = new Dictionary<int, int>()//or use <string, int>
dic.Add(1,1),dic.Add(2,2),dic.Add(3,3),dic.Remove(3),dic.Add(4,4), and so on
foreach(var entry in dic)
{
//i think the order here should be disordered, but fixed?
}

CodeRonnie · January 23, 2024, 3:07am

If you add all of the keys up front, in exactly the same order each time, and don’t remove any or add new ones, it’s a possiblity they may always iterate in the same order.

But, as Bunny pointed out, that would just be a convention of the current implementation, not a language specification. They could change the way Dictionary is coded in a future version of the language, and then when upgrading to some future version of Unity it could alter the expected behavior.

If you were going to try to make it work I would recommend always using the same keys, adding them in the same order, and not removing any keys or adding new ones later. I would also specify the capacity argument when creating the Dictionary so that none of the internal arrays have to resize while doing this. Then, I would create unit tests to verify the expected behavior.

But, my mind starts to wander toward some other solution that doesn’t even use a Dictionary, if you had such specific constraints. It’s just the sort of half baked mental wandering where I don’t know exactly what I’m thinking, but hashing a string, doing a modulus, keying into one array, which keys into another, which may then have to trace through a miniature linked list and do equality checks on the strings… My mind wonders if theres not an even better way to translate string key directly to array index even faster with a custom solution or collection of your own. But, like I said, just mental wandering.

xodennisxo · January 23, 2024, 4:04am

CodeRonnie:

If you add all of the keys up front, in exactly the same order each time, and don’t remove any or add new ones, it’s a possiblity they may always iterate in the same order.

But, as Bunny pointed out, that would just be a convention of the current implementation, not a language specification. They could change the way Dictionary is coded in a future version of the language, and then when upgrading to some future version of Unity it could alter the expected behavior.

If you were going to try to make it work I would recommend always using the same keys, adding them in the same order, and not removing any keys or adding new ones later. I would also specify the capacity argument when creating the Dictionary so that none of the internal arrays have to resize while doing this. Then, I would create unit tests to verify the expected behavior.

But, my mind starts to wander toward some other solution that doesn’t even use a Dictionary, if you had such specific constraints. It’s just the sort of half baked mental wandering where I don’t know exactly what I’m thinking, but hashing a string, doing a modulus, keying into one array, which keys into another, which may then have to trace through a miniature linked list and do equality checks on the strings… My mind wonders if theres not an even better way to translate string key directly to array index even faster with a custom solution or collection of your own. But, like I said, just mental wandering.

Thanks for your recommendation:)
But why you think the fixed order will break by “remove any and add new ones”? If the capacity of the buckets/entries are always inited to the same size, and the capacity expansion algorism is fixed. the order sill is fixed after many add and remove.
for my case, i just need the player is sync in the same game version. I don’t expect a player use the old game version compiled by a older unity version to play with another player with newer game version compiled by a latter unity version.

i think it’s safe to change to orderedDictionary(internally: a array for traversing + a dictionary for quickly getValue) instead of Dictionary, but i’m just feel, Dictionary<int,xx>, Dictionary<string,xx>are already safe under my usage.

Bunny83 · January 23, 2024, 7:54am

Are you sure about that? You haven’t mentioned how many values you expect to have in the dictionary at the same time. Since you use a string as key it means the GetHashCode method will actually iterate through every letter in the string in order to calculate the hash. So if you use relatively long strings this is actually quite expensive.

When the order of the elements is irrelevant, adding and removing items from a List is also O(1) and is actually cheaper compared to a dictionary, way cheaper. To remove an element in O(1) from a List you just replace the item you want to remove with the last element of the list and then remove the last element. This is O(1) and has less overhead than a dictionary.

Unless you have more than several hundred of elements in your dictionary, I highly doubt that the performance of a Dictionary is better than a List. You may want to profile that fact with your actual keys. How do the keys look like? Do they have a quite long shared prefix or are they quite unique from the get go, so string comparison would early exit after the first couple of characters? Many things play into this.

What exact problem do you want to solve? You still haven’t given any numbers and it seems you just assume that a Dictionary will be better. So it seems like premature optimisation 101.

xodennisxo · January 23, 2024, 8:42am

The reason for using Dictionary originally, like always, is because there are many lookup invokes. Dictionary lookup is O(1), and List is O(n). And for Remove, the same, since you need find it first and then remove it.
If you talked about more details, for Dictionary lookup, you do need to call GetHashCode of the param string, and iterate through every letter from that. But that’s 1 time. For List lookup, you traverse every string in the list one by one, compare them(iterate every letter), until you find the one equals to, that’s n times.
So, i mean, in general, Dictionary’s faster than List. Or you just think about Dictionary<int, xxx> vs List, then don’t need consider about string’s getHashCode.
Anyway, my question is about if I can rely on the traverse order for a Dictionary if i always add/remove with the same keys/values in same order. I would like to know if i use Dic<int,x>, Dic<string, xx>, can still keep the deterministic calculation architecture works.

Bunny83 · January 23, 2024, 9:29am

As we explained several times now, that can not be guaranteed as the behaviour is not documented and subject to the implementation details of the Dictionary which could change between frameworks and framework versions. It “may” be consistent at the moment but it isn’t something you should rely on.

No, that’s a general misconception. As I said it depends on the “n”. Again BigO notation does not tell you anything about the actual runtime or performance. This is such a common misconception. BigO notation only tells you about how the performance changes as the number of elements change. So a O(n) algorithm grows linearly. So if an algorithm takes x amount of time for y objects, when you have 2y elements the time would be about 2x.

We still don’t have even a rough estimate of how many entries your dictionary will have at a time (min, avr, max).

Since you seem to clutch on using Dictionary, just to be clear, the order and the influence of add / remove operations on the order is not guaranteed, therefore you should not rely on it. You can if you want to take the risk, that’s up to you. You could write your own implementation of a Dictionary so you can guarantee that it never changes. Though we just tried to offer potential alternatives.

xodennisxo · January 23, 2024, 10:07am

Yes, between frameworks, the order may change since code may change, i know this. However, i’m asking if i used ONE certain Unity version to build the app, means always the same .net framework, the same version. I checked the Dictionary’s source code, i didn’t find there is any logic makes the order become “Random”, once GetHashCode is fixed, the order should be fixed(disordered). Because the position of the entry is decided by the bucket’s index(GetHashCode()%bucketSize) and the linked list order inside a bucket(by the fixed order to add/remove).
As for documentation, the doc mainly talks about the order is disordered. but it’s different for my case, disordered is fine to me, and plus, i only consider to use Dictionary<int or string, xx>, not other Dics which has random GetHashCode, such as Dictionary and Dictionary<Unity.Object>.

CodeRonnie · January 23, 2024, 10:41pm

So, with string.GetHashCode(), I’m just looking at the first search results for C# string source which may not even be the exact implementation you have in your app at the end of it all, but the results don’t seem deterministic here:

github.com

microsoft/referencesource/blob/master/mscorlib/system/string.cs#L833


      
                  [System.Security.SuppressUnmanagedCodeSecurity]
                  [ResourceExposure(ResourceScope.None)]
                  [DllImport(JitHelpers.QCall, CharSet = CharSet.Unicode)]
                  private static extern bool InternalUseRandomizedHashing();
          #endif
          
                  // Gets a hash code for this string.  If strings A and B are such that A.Equals(B), then
                  // they will return the same hash code.
                  [System.Security.SecuritySafeCritical]  // auto-generated
                  [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
                  public override int GetHashCode() {
          
          #if FEATURE_RANDOMIZED_STRING_HASHING
                      if(HashHelpers.s_UseRandomizedStringHashing)
                      {
                          return InternalMarvin32HashString(this, this.Length, 0);
                      }
          #endif // FEATURE_RANDOMIZED_STRING_HASHING
          
                      unsafe {
                          fixed (char *src = this) {

It’s got platform dependent compilation built right in, like:

#if WIN32
                    int hash1 = (5381<<16) + 5381;
#else
                    int hash1 = 5381;
#endif

I went back to look at the source for Dictionary again. Like I said, this is just what I find through searching the web, not necessarily how it will always work. That said, when you add keys to the dictionary, I think they will be added to the Entry array in sequential order, starting at index 0 and going forward 1,2,3… etc. So, it may actually work as you intend, if you add all of the expected keys up front,in order, and don’t remove any of them.

github.com

microsoft/referencesource/blob/master/mscorlib/system/collections/generic/dictionary.cs#L359


      
              index = freeList;
              freeList = entries[index].next;
              freeCount--;
          }
          else {
              if (count == entries.Length)
              {
                  Resize();
                  targetBucket = hashCode % buckets.Length;
              }
              index = count;
              count++;
          }
          
          entries[index].hashCode = hashCode;
          entries[index].next = buckets[targetBucket];
          entries[index].key = key;
          entries[index].value = value;
          buckets[targetBucket] = index;
          version++;

But, if you ever do remove any keys, then the dictionary starts creating a linked list of empty spots that used to be occupied, further back in the Entry array than where the count value can point to the next slot that’s never been assigned.

github.com

microsoft/referencesource/blob/master/mscorlib/system/collections/generic/dictionary.cs#L488


      
                          if (last < 0) {
                              buckets[bucket] = entries[i].next;
                          }
                          else {
                              entries[last].next = entries[i].next;
                          }
                          entries[i].hashCode = -1;
                          entries[i].next = freeList;
                          entries[i].key = default(TKey);
                          entries[i].value = default(TValue);
                          freeList = i;
                          freeCount++;
                          version++;
                          return true;
                      }
                  }
              }
              return false;
          }
          
          public bool TryGetValue(TKey key, out TValue value) {

So, when you go to add a new key, it will not be added sequentially. It will be added to the last freed spot, somewhere earlier in the array, that used to have a key in it, but was removed.

github.com

microsoft/referencesource/blob/master/mscorlib/system/collections/generic/dictionary.cs#L349


      
                              version++;
                              return;
                          } 
          
          #if FEATURE_RANDOMIZED_STRING_HASHING
                          collisionCount++;
          #endif
                      }
                      int index;
                      if (freeCount > 0) {
                          index = freeList;
                          freeList = entries[index].next;
                          freeCount--;
                      }
                      else {
                          if (count == entries.Length)
                          {
                              Resize();
                              targetBucket = hashCode % buckets.Length;
                          }
                          index = count;

So, if you add these keys:

MyDictionary = new Dictionary<string, int>();
MyDictionary["zero"] = 0;
MyDictionary["one"] = 1;
MyDictionary["two"] = 2;
MyDictionary["three"] = 3;
MyDictionary["four"] = 4;

Then iterating them will probably return those key-value-pairs in order.

If you remove these keys:

MyDictionary.Remove("one");
MyDictionary.Remove("three");

And then you add these keys back again:

MyDictionary["one"] = 1;
MyDictionary["three"] = 3;

Then, when you iterate the key-value-pairs, the keys you get will probably be in this order: “zero”, “three”, “two”, “one”, “four”. And that’s really what I was trying to say before about the order getting scrambled up. Maybe you can make it work for you, but I would test it thoroughly. Well, honestly I probably just wouldn’t depend on it personally. I would try to find another way.

Now, if you are going to use int values as keys, you might be able to just have an array where all of the ints just are indices that key directly into an array. I don’t know if that works for what you’re trying to do, but sometimes you can find a simpler way to get the job done.

lordofduct · January 23, 2024, 10:54pm

You’re hoping for determinism. Meaning entries will be added and removed in the same sequence so any misordering in the buckets would be deterministic as they’d be added/removed to their respective bucket in the same sequence.

Also if we assume that Unity uses the same exact .net library when compiling for each platform (it may not be… that’d be a completely different question from the initial proposed question… but for sake of discussion lets assume as such).

The only non-deterministic aspect is the fact that documentation wise GetHashCode for string is not guaranteed between devices. MSDN specifically states it:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes. Furthermore, .NET does not guarantee the default implementation of the GetHashCode method, and the value this method returns may differ between .NET implementations, such as different versions of .NET Framework and .NET Core, and platforms, such as 32-bit and 64-bit platforms. For these reasons, do not use the default implementation of this method as a unique object identifier for hashing purposes. Two consequences follow from this:

You should not assume that equal hash codes imply object equality.

You should never persist or use a hash code outside the application domain in which it was created, because the same object may hash across application domains, processes, and platforms.

With that said, Dictionary specifically has a constructor overload that takes an IEqualityComparer for the case of altering how hashcode’s and equality in the collection is performed:
Dictionary<TKey,TValue> Constructor (System.Collections.Generic) | Microsoft Learn))

You could just write your own comparer that calculates the hashcode deterministically for you. And assuming the dictionary implementation is exactly the same between platforms in unity, then your dictionary should be deterministic.

Of course… more testing would be required and I’d argue this is a potential hotspot.

For my own sanity I’d prefer implementing my dictionary with my own code guaranteeing that the implementation matches ALWAYS regardless of version. This way if say you release version 1.1 of the game and they play versus someone in 1.0 there isn’t a chance that 1.1 built on a newer version of unity has a newer version of dictionary and therefore is no longer deterministic.

Dictionary is defined as being unordered, so therefore it should be treated as unordered.

I feel this way about a lot of things… I’d do the same with my RNG as well (and have: spacepuppy-unity-framework-4.0/Framework/com.spacepuppy.core/Runtime/src/Utils/RandomUtil.cs at master · lordofduct/spacepuppy-unity-framework-4.0 · GitHub)

xodennisxo · January 24, 2024, 2:59am

lordofduct:

You’re hoping for determinism. Meaning entries will be added and removed in the same sequence so any misordering in the buckets would be deterministic as they’d be added/removed to their respective bucket in the same sequence.

Also if we assume that Unity uses the same exact .net library when compiling for each platform (it may not be… that’d be a completely different question from the initial proposed question… but for sake of discussion lets assume as such).

The only non-deterministic aspect is the fact that documentation wise GetHashCode for string is not guaranteed between devices. MSDN specifically states it:

https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=net-8.0

With that said, Dictionary specifically has a constructor overload that takes an IEqualityComparer for the case of altering how hashcode’s and equality in the collection is performed:
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2.-ctor?view=net-8.0#system-collections-generic-dictionary-2-ctor(system-collections-generic-iequalitycomparer((-0)))

You could just write your own comparer that calculates the hashcode deterministically for you. And assuming the dictionary implementation is exactly the same between platforms in unity, then your dictionary should be deterministic.

Of course… more testing would be required and I’d argue this is a potential hotspot.

For my own sanity I’d prefer implementing my dictionary with my own code guaranteeing that the implementation matches ALWAYS regardless of version. This way if say you release version 1.1 of the game and they play versus someone in 1.0 there isn’t a chance that 1.1 built on a newer version of unity has a newer version of dictionary and therefore is no longer deterministic.

Dictionary is defined as being unordered, so therefore it should be treated as unordered.

I feel this way about a lot of things… I’d do the same with my RNG as well (and have: https://github.com/lordofduct/spacepuppy-unity-framework-4.0/blob/master/Framework/com.spacepuppy.core/Runtime/src/Utils/RandomUtil.cs#L388)

Fair enough. Thanks you for your guide.
Originally, I actually tend to the way that write my own. But i was looking for more evidence to confirm the exactly necessity. Cuz my situation is different for what generally the document talks about. For example, the disorder is totally fine once it’s deterministic, and i only need it work under one unity compiling for iOS and android and i don’t allow the players to play with different game versions that built from different unity version.
What i’m doing now, is write my own OrderedDictionary, and put a macro in this class to make it can switch to common Dictionary, and do more test about their difference of deterministic and performance.

Topic		Replies	Views
Populating Dictionary getting slow after 1M Unity Engine Scripting	14	3386	September 28, 2019
[Solved]Question about Dictionary.Any Vs Dictionary.ContainsKey Unity Engine Scripting	15	4493	November 29, 2018
[Interim solution] Instances share the same HashCode Unity Engine Scripting	10	3832	November 29, 2018
Latest performance tips Unity Engine Scripting	73	10282	September 22, 2015
accessing a dictionary? Unity Engine Scripting	8	3163	August 1, 2021

Is string's GetHashCode() always return same value on different sessions in iOS and Android?

Related topics