Animator StringToHash() performance, collisions?

Hey everyone!

I know there is StringToHash, but is there a way to access clips as a consistent array so that Unity doesn’t have to compare potentially a lot of strings (or ints) when looking for animations? Or does StringToHash do exactly that by itself?

Also, may there be two results of StringToHash that are the same? Should I check for collisions manually or does Unity check for this and return an alternate hash?

Best wishes,
Shu

The term ‘hash’ in regards to the Animator is referring to a hash value associated with an entry in a hash table.

Essentially an algorithm is defined for generating a hash value. Given an object (in this case a string/animation state) has a hash value calculated for it. Then when placed into the table, the index at which it is placed is determined by the hash (usually as the modulo of the hash with the length of the table, where said length is a prime number to reduce collisions). This allows for fast look up since the index can be determined from the hash (collisions can occur depending the algorithm, in which case buckets are used to determine equality… but the hash reduces the number of equality tests down significantly. Instead of testing all entries, you only test those whose modulo/index equals that of the modulo of another object’s hash).

A simple hash could be as basic as the index in said table/array. In which case buckets aren’t even necessary.

It could be more complex, such as the hash code generated by ‘GetHashCode’ on all .net objects. Which isn’t guaranteed to be unique, and buckets are in use, such as with HashSet and Dictionary<T1,T2>.

In the end though… unity’s algorithm is defined to be unique (as far as I know). It is speedier to look up by hash than by string. So what you should do is in your scripts, on ‘Start’ you get the hash for your string name of the animation state. And then you always call to play an animation by the hash.

This will significantly reduce the overhead of looking up which animation you’re looking for. Since the string would have to be either matched, or the hash calculated for the string. Where as if you have the hash pre-calculated, that overhead is gone.

Here are articles about how hash tables work:

Note I’m basing my assumption of the hash value being unique on this video:

Dude man says it’s unique around 3:45.

4 Likes

@lordofduct
Thanks for your detailed answer!
Concerning performance, it does seem to be the way to go!

About the hash being unique:
What confuses me is that it’s not mentioned in the docs for StringToHash() ( Unity - Scripting API: Animator.StringToHash ), whereas the docs for Shader.PropertyToID() ( Unity - Scripting API: Shader.PropertyToID ) say the ID is unique.
Maybe I will run some tests to check for collision, just in case.
Maybe like this?:

CheckForCollisions.cs

using UnityEngine;
using System.Collections;
using System.Collections.Generic;

public class CheckForCollisions : MonoBehaviour {

string[] alphabet = {
    "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"
};
string[] randomNames = new string[10000];
int[] randomNamesHashes = new int[10000];

IEnumerator Start () {
    while (true) {
        RandomNames_Setup();
        SetHashes();
        CheckForCollisions_();
        Debug.Log("10000 done");
        yield return new WaitForSeconds(0.1f);
    }
}

void RandomNames_Setup () {
    for (int i = 0; i < randomNames.Length; i++) {randomNames[i] = "";};
    int nameLength = 0;
    int letterNumber = 0;
    for (int i = 0; i < randomNames.Length; i++) {
        nameLength = Random.Range(5,20);
        for (int j = 0; j < nameLength; j++) {
            letterNumber = Random.Range(0,alphabet.Length);
            randomNames[i] += alphabet[letterNumber];
        }
    }
}

void SetHashes () {
    for (int i = 0; i < randomNamesHashes.Length; i++) {
        randomNamesHashes[i] = Animator.StringToHash(randomNames[i]);
    }
}

void CheckForCollisions_ () {
    for (int i = 0; i < randomNamesHashes.Length; i++) {
        for (int j = 0; j < randomNamesHashes.Length; j++) {
            if (randomNamesHashes[i] == randomNamesHashes[j]) {
                if (i != j) {
                    Debug.Log("COLLISION");
                }
            }
        }
    }
}

}

Unity’s docs can be vague some times.

It gets really annoying.

The thing is… unity makes StringToHash available and suggests you use it. If it caused collisions, they wouldn’t make it available (not that HashSet and Dictionary from microsoft don’t have a way to access by the hash… because they know it collides, that would be bad design).

With that said… who knows… unity forgets to document stuff sometimes. And they’ve made bad api design chocies in the past. They might have made a dumb mistake this time around and offered up a feature that might cause collisions. And maybe those odds are just like 1 in 4 billion or something.

Thing is… I know lots of people use StringToHash, and I’ve NEVER heard of a complaint about it colliding anywhere.

You can run some tests… but if there are collisions, you may not even find them.

3 Likes

Alright, I am making the script run right now. (I edited my last post, I am running the script there on an empty GameObject in a new scene.)
So far, 100 iterations and, indeed, 2 collisions. Thats 1/500,000.
However, I don’t know if the script I wrote is the right way to check.

The script you wrote can create identical strings multiple times, since you’re using Random to create it. Random can create non-unique sequences all the time.

Give me a sec and I’ll create a script that would be more accurate.

1 Like

You’re right! Would this be a solution?

using UnityEngine;
using System.Collections;
using System.Collections.Generic;

public class CheckForCollisions : MonoBehaviour {

string[] alphabet = {
    "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"
};
string[] randomNames = new string[10000];
int[] randomNamesHashes = new int[10000];

IEnumerator Start () {
    while (true) {
        RandomNames_Setup();
        SetHashes();
        CheckForCollisions_();
        Debug.Log("10000 done");
        yield return null;
    }
}

void RandomNames_Setup () {
    for (int i = 0; i < randomNames.Length; i++) {randomNames[i] = "";};
    int nameLength = 0;
    int letterNumber = 0;
    for (int i = 0; i < randomNames.Length; i++) {
        nameLength = Random.Range(5,20);
        for (int j = 0; j < nameLength; j++) {
            letterNumber = Random.Range(0,alphabet.Length);
            randomNames[i] += alphabet[letterNumber];
        }
    }
    for (int i = 0; i < randomNames.Length; i++) {
        for (int j = 0; j < randomNames.Length; j++) {
            if (randomNames[i] == randomNames[j]) {
                if (i != j) {
                    Debug.Log("NAMES DOUBLE");
                }
            }
        }
    }
}

void SetHashes () {
    for (int i = 0; i < randomNamesHashes.Length; i++) {
        randomNamesHashes[i] = Animator.StringToHash(randomNames[i]);
    }
}

void CheckForCollisions_ () {
    for (int i = 0; i < randomNamesHashes.Length; i++) {
        for (int j = 0; j < randomNamesHashes.Length; j++) {
            if (randomNamesHashes[i] == randomNamesHashes[j]) {
                if (i != j) {
                    Debug.Log("COLLISION");
                }
            }
        }
    }
}

}

If there are more “COLLISION” than “NAMES DOUBLE”, it’s a collision due to StringToHash()?

Just adding my 2 cents – as already mentioned by @lordofduct , it’s quite possible that there are collisions, but they are handled in buckets. So, they are still resolved very quickly, especially if you have the hash already stored. It’s not going to be bothering you :slight_smile:

1 Like

Here you go:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class zTest02 : MonoBehaviour {


   
    private Dictionary<int, string> _words = new Dictionary<int, string>();

    void Start ()
    {
        System.Threading.ThreadPool.QueueUserWorkItem(this.DoWork);
    }

    private void DoWork(object token)
    {
        const int NUM_OF_WORDS = 10000000;

        int collisions = 0;
        for(int i = 0; i < NUM_OF_WORDS; i++)
        {
            int j = i;
            string word = string.Empty;
            while(j > 62)
            {
                word += IntToChar(j % 63);
                j /= 63;
            }

            word += IntToChar(j);

            int hash = Animator.StringToHash(word);
            if(_words.ContainsKey(hash))
            {
                collisions++;
                Debug.Log("COLLISION: " + _words[hash] + " : " + word);
            }
            else
            {
                _words[hash] = word;
            }
        }

        Debug.Log("COUNT: " + collisions.ToString() + " : " + NUM_OF_WORDS.ToString());
        Debug.Log("ODDS: 1 : " + (collisions > 0 ? (NUM_OF_WORDS / collisions).ToString() : "inf"));
        Debug.Log("RATIO: " + (collisions / NUM_OF_WORDS).ToString());
    }

    public static char IntToChar(int i)
    {
        if(i < 26)
        {
            return (char)(i + 65); //upper
        }
        else if(i < 52)
        {
            return (char)(i + 71); //lower
        }
        else if(i < 62)
        {
            return (char)(i - 4); //numerics
        }
        else
        {
            return '.';
        }
    }

}

I just ran it at 10 million words, and I got 0 collisions.

Note, I ran only alpha numeric values and the period ‘.’. This is because these are really the only characters that should show up as state/parameter names.

Also note, when I change my script to include other characters… collisions start occurring. I did run it with out numerics and I got a few collisions on names that had repeated periods in a row; “…a” for example.

I didn’t run for more, because it takes a while. But to me it seems they try to avoid collisions within the allowed character set. I think the repeated periods caused issues because really you shouldn’t have that (periods relate to nested states… so you should always have a name in between periods, not …).

This says to me there 'might be collisions. But odds are slim. As long as you stay within the approved naming schema.

3 Likes

OK…

I just ran just alpha-numeric (no periods, to rule out the … clashing).

I did 2,147,483,647 words (that’s int.MaxValue).

2 billion words.

I’m getting collisions right now… it’s still running (going to take a while. I’m currently at 10 gigs of memory consumed… DO NOT RUN THIS unless you have a lot of memory in your machine. I have 64 gigs of memory, so I can do it).

They’re still not super common though.

When it’s done I’ll post the odds.

7 Likes

lol

1 Like

Were not talking about bucket collisions.

We’re talking about 2 identical hashes calculating the same for 2 different names from StringToHash.

This sort of collision could cause an issue in Animator.

What I hope to do is after running this, getting clashing string (they seem to calculate the same from run to run). And see what happens if you pass in the hash to Animator.

2 Likes

I just updated the script in my last post so that it actually runs.
205 iterations of this, 10 NAMES DOUBLE, 14 COLLISION.
So 4/2,050,000 chance of collision for this script?

Thanks for testing as well! Interesting to see that it seems to depend on the values used for the names.
Personally, I use “_” a lot. Something like “Throw_Main_Heavy1.”
If chances were at about 1/500000, and I use 200 animations, is the collision chance at 500000/200% = 1/2500 for each time starting the scene?

Wouldn’t Unity just use one of the two animations then?

Yes, I stand corrected. I tried to google this, but came up short. Now I’m curious what will happen, too.

We don’t know that…

Note, I plan to stick those names into an animation controller. There’s several things that could happen.

  1. it just plays 1 of the 2, becuase it just finds the first match.

  2. unity could freak out if you put in a name that hashes the same as another. Causing either a warning message that your names are not valid, or even crashing unity, or who knows what else.

  3. unity’s algorithm may rely on known state names. And by adding entries to an animation controller adjusts the algorithm, rectifying for the collision.

We don’t know… note, we’ve been calling ‘StringToHash’ on names that don’t exist in known animation controllers. So… we gotta see what happens.

Also, I’m making pulled pork right now… so I may be spurratic.

I am interested in seeing where this goes too. Cause yeah, I googled early on as well and noticed no information on it.

That’s what I was thinking. I’m hoping it’s #2 or #3 (with #2 being a warning not a crash lol). At least those are workable solutions. #1 doesn’t sound great.

I changed the script to test on 200 strings at once and added some characters to the alphabet, which is more like a realistic scenario.

using UnityEngine;
using System.Collections;
using System.Collections.Generic;

public class CheckForCollisions : MonoBehaviour {

string[] alphabet = {
    "a","b","c","d","e","f","g","h","i","j","k","l","m",
    "n","o","p","q","r","s","t","u","v","w","x","y","z",
    "0","1","2","3","4","5","6","7","8","9","_","."
};
string[] randomNames = new string[200];
int[] randomNamesHashes = new int[200];

int iterations = 0;
IEnumerator Start () {
    StartCoroutine(UpdateState_Iterations());
    while (true) {
        RandomNames_Setup();
        SetHashes();
        CheckForCollisions_();
        iterations++;
        yield return null;
    }
}

IEnumerator UpdateState_Iterations () {
    YieldInstruction wait1 = new WaitForSeconds(60.0f);
    while (true) {
        Debug.Log("200 done: " + iterations.ToString());
        yield return wait1;
    }
}

void RandomNames_Setup () {
    for (int i = 0; i < randomNames.Length; i++) {randomNames[i] = "";};
    int nameLength = 0;
    int letterNumber = 0;
    for (int i = 0; i < randomNames.Length; i++) {
        nameLength = Random.Range(5,20);
        for (int j = 0; j < nameLength; j++) {
            letterNumber = Random.Range(0,alphabet.Length);
            randomNames[i] += alphabet[letterNumber];
        }
    }
    for (int i = 0; i < randomNames.Length; i++) {
        for (int j = 0; j < randomNames.Length; j++) {
            if (randomNames[i] == randomNames[j]) {
                if (i != j) {
                    Debug.Log("NAMES DOUBLE");
                }
            }
        }
    }
}

void SetHashes () {
    for (int i = 0; i < randomNamesHashes.Length; i++) {
        randomNamesHashes[i] = Animator.StringToHash(randomNames[i]);
    }
}

void CheckForCollisions_ () {
    for (int i = 0; i < randomNamesHashes.Length; i++) {
        for (int j = 0; j < randomNamesHashes.Length; j++) {
            if (randomNamesHashes[i] == randomNamesHashes[j]) {
                if (i != j) {
                    Debug.Log("COLLISION");
                }
            }
        }
    }
}

}

I have to go now, but I will probably let this run tomorrow for some time.
Maybe even for Shader.PropertyToID.

So unfortunately it is case 1.

Just tested it… and yeah… it just plays the first one with that hash.

Not really cool I must say.

But honestly… I’m not super surprised with Unity’s track record.

8 Likes

Well, your work has provided some insight, on the off chance someone ever gets stuck with an animation that never plays. :slight_smile:

1 Like

So last night I’m laying in bed.

And I’m like, “wait, if they’re using the hash to get the index in a table… that means calling ‘Play’ by the state name just calls StringToHash for us. Do they not have buckets and not check equality? Are they assuming uniqueness???”

So this morning I tested.

Named 2 states values that evaluate to the same hash.

And sure enough… it only plays the first of the 2 states when calling by string.

So this problem technically effects even calling it by strings.

So really… caching the StringToHash is still preferred because the issue still arises for both.

Not cool.

I’d consider this a bug.

10 Likes