TextAsset or ScriptableObject for a Words Game

Hi there,

Im building a word game and want to know experienced folks wich’d be a better option for performance wise? Using a TextAsset that i load at runtime, parse to a hashset or a ScriptableOject with hashset?

Please advise
Thanks

It does not matter as both are assets and you can do everything to them what you want from assets like a reference, load, unload, include in asset bundle, whatever. Use what you feel more comfortable with or more challenging if you want a challenge. As a side note, you can create your own type of asset and ScriptedImporter for this :wink:

1 Like

My assumption is as im not really familiar but thats why im posting here; that ScriptableObject will be better because simple text asset to array conversion will happen on runtime but scriptableobject will already be in native form.
Again not so sure so hopefully here for better advise

While scriptable object gives you better control over the data structure, text asset allows you to ever read it line by line from disk, i.e. gives you more control over text format and I/O operations. Also note Unity cant serialize things like hashmaps, at least afaik.

If you’re talking about a multi-megabyte dictionary for a word game, you should encode the individual words as strings in an array in a scriptable object. Parsing a giant piece of text on many mobile devices can be excruciatingly slow.

2 Likes

Thanks thats what i thought that SO will be a better option.
But instead of Array, i want to use HashSet so checking for a valid word wont have to iterate on million words.
So is SO works with that?

SO may only store list or array of strings. But you may manually initialize the hashmap from it’s data.

ScriptableObjects are still saved as files and have to be parsed when they are loaded into memory. Just because the Unity serializer is doing the work doesn’t mean it’s not being done.

I haven’t profiled it (and it would be difficult to profile it, given that Unity’s loading of the ScriptableObject might happen before you’re able to kick off any Profiler samples), but if the data is literally just a word list, it would not at all surprise me if a well-written text parser is faster than loading a ScriptableObject.

You’d also have the advantage that it’d be easier to edit the word list (it’s just a text file, you can copy/paste, mass delete lines, etc, etc), and you can easily supplement your word list later without needing to recompile anything.

This is true.

I’ve profiled this and I’ve authored a word game. On an iPhone 4s, it will take 9 seconds to parse a 3MB line-separated text file, which you must do character-by-character to find the line breaks. Don’t do it!

Unity’s serialization model for string arrays only has to “make a mono string” for each string in the array once, which for short strings like words is constant time.

Also, you cannot serialize a hashset with native Unity serialization either.

On the other hand, you think you want a hash set, when you really want a string trie.

To load either a string trie or hash set easily, you need to author an “arena allocated” memory-mappable representation of the hashset/trie. For small words, this is as easy as assuming all strings are the same maximum length, and “looking up” in a byte[ ] array file. Create this in the editor using a C# script. To clarify, you are creating a byte[ ] array which can be interpreted as a string trie/hash set, then saving the byte[ ] array as a binary asset.

2 Likes

@StarManta @doctorpangloss
Thanks guys, so its clear that using SO was a wrong approach with one confusion which ill discuss below

@doctorpangloss
I really liked the Tri String idea, didnt know it earlier but reading following made it clear that this is what i should do on runtime

So if i use Tri String, how should i store/load the text? as my words will be of different length? retrieve a text file or retrive array based SO? or i actually serialize the Tri DataStructure and then deserialize it on runtime?

Thanks again for helping

You want to serialize and deserialize into a memory mapped format, i.e., loading bytes from disk and interpreting them with methods, rather than turning it into an object with fields.

This is pretty challenging. Encoding a dictionary, especially a tree of dictionaries, is difficult to represent in a byte buffer. So even doing a scriptable object array of strings and creating a dictionary from that will help a lot, because if you only have 20,000 words, you will at least not allocate 20,000 strings at run time.

There are dictionaries that specialize in single char key, pointer values. There’s lots of ways to make this very fast, and incrementally so.