Loading a large word list . .

Hi all,

I have a word game I’m porting over to iPad that reads from a word list and loads each word into a trie at startup. This was working fine in our Mac, Windows, and web builds, but I had to change the way the file was stored and loaded in order to run the app on an iPad. The new technique takes (literally) half an hour or more to load the file. I was hoping someone here might have a better solution to offer.

My original setup simply had the word list as a plain text file that sat uncompressed in the application’s package contents and was loaded via this code:

if (!File.Exists(FILENAME)) {
	print("File does not exist at: " + FILENAME );
}
else {

	
	StreamReader stream = File.OpenText(FILENAME);

	while ((input = stream.ReadLine()) != null)
	{
		trieTable.SendMessage("AddWord", input);
	}
	
	stream.Close();
	appControl.SendMessage("OnLoadWordsComplete");
}

I’ve now changed to putting the file into the Resources folder and simply assigning it via Unity editor. Then calling this code:

public TextAsset theFile;

...

using (StringReader reader = new StringReader(theFile.text))
{
	while ((input = reader.ReadLine()) != null)
	{
		trieTable.SendMessage("AddWord", input);
	}
	
	appControl.SendMessage("OnLoadWordsComplete");
}

However, as I mentioned, this way is unbearably slow and I don’t know enough to understand why.

The word list is a newline separated list of 53,800 words and weighs in at about 350k on disk if that’s important to know.

Thanks,
~Rob

What I did was use an text asset which loaded the file for me. Than I just did simple m_words = textAsset.text.Split(‘\n’); and it seems to load the whole thing lickty split over 100k words without much problem.

Thanks. That solves the loading issues. Split() is super fast. Now it seems I have to speed up my parsing code as it still takes just over 30 seconds to get the word trie set up when run on the iPad (it takes slightly over a second to load and parse in the editor). Is there a huge difference in the speed of JavaScript versus C#? Should I consider rewriting the trie in C#?

Thanks again for the Split() suggestion. That was exactly what was needed.

~r

There’s generally no major difference in JS and C# speed if you’re doing the same thing in either language.

–Eric

SendMessage isn’t the fastest way to invoke some method 50,000 times in a row. Just obtain script instance and call it directly.

@mantasp:

Thanks, yeah, it’s not coded that way anymore. It was one of the first things I changed after adding Split(). I just send the whole array to the trieTable and have it split and addWords using a private method instead. (Well, actually now I just use a hashtable, but that’s what I was doing when last I was using a trie.)

~r