Hello,
I’m porting an old Unity word game I made from PC standalone to iOS and I’m running into a hangup with the time it takes for my code to construct a trie from about 50,000 elements. On the Mac and Windows, it took a little over a second to load the words and construct the trie, but on the iPad I’m testing on, it takes about 30 seconds to do the same job. This is obviously an unacceptable wait time to load the game, so I was hoping someone might have some suggestions for another data structure I could use or some optimizations for the one I’ve constructed. The following is my trie code (the empty “if” blocks are where I had logging code inserted):
class TrieNode {
var letter : String;
var isWord : boolean;
var nextNode = new Hashtable();
function Init() {
this.letter = "\0";
this.isWord = false;
}
}
var trieRoot = new TrieNode();
var numNodes = 0;
function NewNode(letter : String) {
var theNode = new TrieNode();
numNodes++;
theNode.Init();
theNode.letter = letter;
return theNode;
}
function AddWords(words : String[]) {
for (var word : String in words) {
AddWord(word);
}
var appControl = FindObjectOfType(AppControl) as AppControl;
appControl.SendMessage("OnLoadWordsComplete");
}
function AddWord(word : String) {
var currNode = trieRoot;
for (var currLetter : char in word) {
nextLetter = "" + currLetter;
if ( currNode ) {
if ( currNode.nextNode[nextLetter] ) {
}
else {
currNode.nextNode[nextLetter] = NewNode(nextLetter);
}
currNode = currNode.nextNode[nextLetter];
}
else {
}
}
currNode.isWord = true;
}
function IsWord(word : String) {
var currNode = trieRoot;
for (var currLetter : char in word) {
nextLetter = "" + currLetter;
if ( currNode ) {
if ( currNode.nextNode[nextLetter] ) {
}
else {
return false;
}
currNode = currNode.nextNode[nextLetter];
}
else {
return false;
}
}
if ( currNode.isWord ) return true;
}
Thanks,
~Rob
Here’s some code I have used, might have a few errors in this version(its also not tested on latest Unity), but the data structures are what you need.
100k word list loads in about 0.1 seconds on my Mac, about 3 seconds on 3GS.
Main difference I can see is the main data structure uses a (fixed length) array (not a hashtable). Also a node doesn’t need to know what letter it is as it’s parent “knows” by virtue of the array position. A single hashtable is used during construction to speed the process.
class WordTrie {
var word:boolean;
var children:WordTrie[];
function WordTrie(word:boolean) {
this.word = word;
children = new WordTrie[26];
}
function WordTrie(word:boolean, children:WordTrie[]) {
this.word = word;
this.children = children;
}
}
class Dictionary{
var words:WordTrie;
var nodes:Hashtable;
// Construct the dictionary
function Dictionary() {
nodes = new Hashtable();
generateDictionary();
words = (nodes[""] as WordTrie);
nodes = null;
}
// Check if word is in the dictionary
function isWord(word:String):boolean {
return isWord(word, 0, words);
}
function isWord(word:String, pos:int, trie:WordTrie):boolean {
var c:WordTrie = trie.children[charToAscii(word[pos])];
if (c != null) {
if (word.length == pos + 1) {
return c.word;
} else {
return isWord(word, pos + 1, c);
}
} else {
return false;
}
}
function generateDictionary() {
// Point this at your dictionary file (.txt file with a single word per line)
var sr:System.IO.StreamReader = new System.IO.StreamReader("/tmp/words.txt");
var word:String = sr.ReadLine();
while (word != null) {
addWord(word,null,0);
word = sr.ReadLine();
}
sr.Close();
}
function addWord(word:String, child:WordTrie, childPos:int) {
if (nodes.ContainsKey(word)) {
if (child == null){
(nodes[word] as WordTrie).word = true;
} else {
(nodes[word] as WordTrie).children[childPos] = child;
}
} else if (word.length > 0) {
var node:WordTrie = new WordTrie(child == null ? true : false);
node.children[childPos] = child;
nodes[word] = node;
addWord(word.Substring(0, word.length - 1), node, charToAscii(word[word.length - 1]));
} else {
var rootNode:WordTrie = new WordTrie(false);
rootNode.children[childPos] = child;
nodes[word] = rootNode;
}
}
// This is a lot simpler in the c# version :)
static function charToAscii(c:char) {
var i:int = -1;
switch (c) {
case "a"[0]: i = 0; break;
case "b"[0]: i = 1; break;
case "c"[0]: i = 2; break;
case "d"[0]: i = 3; break;
case "e"[0]: i = 4; break;
case "f"[0]: i = 5; break;
case "g"[0]: i = 6; break;
case "h"[0]: i = 7; break;
case "i"[0]: i = 8; break;
case "j"[0]: i = 9; break;
case "k"[0]: i = 10; break;
case "l"[0]: i = 11; break;
case "m"[0]: i = 12; break;
case "n"[0]: i = 13; break;
case "o"[0]: i = 14; break;
case "p"[0]: i = 15; break;
case "q"[0]: i = 16; break;
case "s"[0]: i = 17; break;
case "s"[0]: i = 18; break;
case "y"[0]: i = 19; break;
case "u"[0]: i = 20; break;
case "v"[0]: i = 21; break;
case "w"[0]: i = 22; break;
case "x"[0]: i = 23; break;
case "y"[0]: i = 24; break;
case "z"[0]: i = 25; break;
case "A"[0]: i = 0; break;
case "B"[0]: i = 1; break;
case "C"[0]: i = 2; break;
case "D"[0]: i = 3; break;
case "E"[0]: i = 4; break;
case "F"[0]: i = 5; break;
case "G"[0]: i = 6; break;
case "H"[0]: i = 7; break;
case "I"[0]: i = 8; break;
case "J"[0]: i = 9; break;
case "K"[0]: i = 10; break;
case "L"[0]: i = 11; break;
case "M"[0]: i = 12; break;
case "N"[0]: i = 13; break;
case "O"[0]: i = 14; break;
case "P"[0]: i = 15; break;
case "Q"[0]: i = 16; break;
case "R"[0]: i = 17; break;
case "S"[0]: i = 18; break;
case "T"[0]: i = 19; break;
case "U"[0]: i = 20; break;
case "V"[0]: i = 21; break;
case "W"[0]: i = 22; break;
case "X"[0]: i = 23; break;
case "Y"[0]: i = 24; break;
case "Z"[0]: i = 25; break;
}
return i;
}
}
I did have an attribution requirement on using this code, but I’ll waive it for you.
Also the charToAscii function can be replaced with something like an if statement in conjunction with System.Convert.ToInt32.
Thanks for the reply, Johnny.
I’m to still trying to remember what and how the trie works as I’m attempting to implement your code. In the meantime I’ve gotten the code to compile but Unity throws a NullReferenceException at the first if statement of the addWords() function. I don’t use Hashtables a lot myself, so I haven’t yet figured out what all is wrong with the implementation as it is. Any thoughts as I continue to hack away at it?
~r
It would be quite a bit faster/better to use Dictionary instead of Hashtable, for the same reasons for using List instead of ArrayList (or the JS Array class).
–Eric
@JohnnyA:
Alright, after I figured out my old code (reading years old code is fun . . .) and after I sat starting blankly at my screen for a while trying to grasp why the nodes didn’t need to know what letters they were, I finally had my epiphany and did a rewrite of my original trie (I never did get your code working). I came up with the following (for anyone else who comes across this thread):
class TrieNode {
var isWord : boolean;
var nextLetter : TrieNode[];
function TrieNode() {
isWord = false;
}
}
private var trieRoot : TrieNode;
function Start() {
trieRoot = new TrieNode();
trieRoot.nextLetter = new TrieNode[26];
}
function GenerateDictionary(words : String[]) {
for (var word : String in words) {
AddWord(word);
}
var appControl = FindObjectOfType(AppControl) as AppControl;
appControl.SendMessage("OnLoadWordsComplete");
}
function AddWord(word : String) {
var currNode = trieRoot;
for (var c : char in word) {
i = charToAscii(c);
if ( i > -1 ) {
if ( currNode ) {
if ( !currNode.nextLetter[i] ) {
currNode.nextLetter[i] = new TrieNode();
currNode = currNode.nextLetter[i];
currNode.nextLetter = new TrieNode[26];
}
else {
currNode = currNode.nextLetter[i];
}
}
}
}
currNode.isWord = true;
}
function IsWord(word : String) : boolean {
var currNode = trieRoot;
for (var c : char in word) {
i = charToAscii(c);
if ( i > -1 ) {
if ( !currNode.nextLetter[i] ) {
return false;
}
currNode = currNode.nextLetter[i];
}
else {
return false;
}
}
return currNode.isWord;
}
It works beautifully with 3-4 second generation time on my iPad. FAR more usable!
Thanks for taking the time to help and for the great suggestions!
~Rob
@BasketQase
Sorry code wasn’t running for you (I do have a running version on 3.x in a half finished game but thats on a different machine), however I’m glad you solved your problems 
@Eric
I was mainly concerned about memory footprint and look-up speed not generation speed; it wasn’t causing an issue so I never got round to optimising the generation. If I ever push the project to release I’ll be sure to update and test with Dictionary.
Cheers.
@BasketQase: can you give any more information about what function the trie performs in the game? Historically, they have been an efficient way to check for the presence or absence of a word in a dictionary but they perform quite badly on modern hardware due to the pattern of memory usage. If you need autocompletion or some of the other functions of a trie then it may still be the best option but a simple lookup, say, would be easier to do with a hashtable or skip-list.
@andeeee:
The user inputs a word from 3-7 characters long and I just need to find out if it exists in a word list consisting of about 50k entries. I don’t really remember why I chose to use a trie (I wrote the original app in 2007), but likely it was simpy the first solution that came up in google that I could understand. How would one use a hashtable or skip-list (the latter of which I’ve not heard of)?
~r
You can use strings as the keys for a hashtable/dictionary. The value that is looked up by the string is not important since the lookup is simply to determine if the key exists. For example, with the string keys already added to the table, you could check for the presence of a word with something like:-
if (wordList.Contains(candidateWord)) {
// Word is in the list.
}
Hashtables are much more efficient than tries both for inserting and looking up items but tries support additional operations (eg, generating all words starting with a given prefix).
A skip list is just a sorted array or linked list which is accompanied by a second list that acts as a kind of index on the main list. For example, in your game, the main list would contain the words in alphabetical order and the index list might contain every hundredth word in the main list along with its position. You then search the index until you find the last entry that occurs before your candidate word alphabetically. Then, you start at the position of that word in the main list and just search in a linear fashion until you find the word you want or one which is alphabetically after it (which indicates that the candidate word isn’t in the dictionary). This type of structure is probably overkill for what you are doing but it can be designed so that it loads very quickly.
@andeeee
Thanks for the input. I wish I’d known to or thought to use a hashtable before! It was far less complex to implement and is about 4 times faster than the most recent trie implementation I was running (the trie clocked in about 3-4 seconds, a simple hashtable was under a second).
For anyone who comes across this later, this is the code I’m using now:
private var wordList : Hashtable;
function Start() {
wordList = new Hashtable();
}
function GenerateDictionary(words : String[]) {
for (var word : String in words) {
// I just set the value to a boolean, 'cause it's small.
// The boolean's not used for anything.
wordList.Add(word, true);
}
appControl.SendMessage("OnLoadWordsComplete");
}
function IsWord(word : String) : boolean {
return wordList.Contains(word);
}
Thanks again for all the help everyone!
~Rob
Bit of a dead topic, but even if you aren’t doing auto-completion another reason to use a Trie is memory usage. An efficient trie (which my example certainly isn’t) will use a fair bit less memory than the standard hashtable if you have a dataset with a lot of overlapping prefixes (e.g. very large word lists). If you overlap suffixes too you get a directed acylcic graph which is smaller again (although you would probably want to precaclulate this as it can be expensive to generate).
Also fixed the bug in my sample code so it should run now 