How to deal with escaped strings when storing encoded strings in Unity assets?

This is a very specific question about how Unity deals with strings that contain escape characters like the backslash when serializing assets.

I read string data from an Excel file and write it to a ScriptableObject on disk. Unity seems to handle almost all encoding issues for me. Basically, I can use non-ascii symbols (like Umlauts and smileys) and Unity will correctly use escaped unicode code points to store them. For example, the serialized text for the german word merkwürdig shows as merkw\xFCrdig on disk. So that’s all fine and dandy.

However, when I try to deliberately insert such escaped unicode characters into the text in Excel by hand, Unity will escape the escaped string and therefore break things like soft-hyphens or invisible spaces for use in Unity’s text components. For example, my text in Excel is Polizei\u00ADbericht but the YAML on disk shows Polizei\\u00ADbericht.

What is going on here and how do I fix it? Is this something to fix in Excel, should I somehow unescape strings in my import script that writes to the ScriptableObject, or do I need to do this before setting the text component?

I’ve tried using System.Uri.UnescapeDataString but to no avail. And I couldn’t find anything to configure in Excel to make this work either. If I do a string.contains(@“\”) my input that I read from Excel returns false, so I can’t really detect the double escape.

Are you sure it is Unity doing that escaping? When I paste this text into a string field in my ScriptableObject, the resultant YAML on disk has exactly the same single backslash and all other characters.

In my SO in the editor:

6828800--793784--Screen Shot 2021-02-11 at 10.36.44 AM.png

Diff on disk:

6828800--793781--Screen Shot 2021-02-11 at 10.37.44 AM.png

As an aside, I do see the exact behavior you note for the first text with the umlaut.

1 Like

No, not really. I’m using the “LightweightExcelReader” plugin to parse my Excel file and then write the strings to the Unity asset. It’s just so hard to debug, because even the debugger window does some sort of escaping/unescaping of strings.

I did find a quick-fix a few minutes ago: Regex.Unescape on my input before writing it into the asset will correctly produce a single backslash instead of double backslash. System.Uri.Unescape for some reason did nothing.

This means it might simply be an issue with the .xlsx file reader I’m using, but also I’m unsure what to expect. Like, is it even correct to assume that Unity must store a single backslash? It seems to because then TextMeshPro works correctly.

This I am not sure. I would just defer to what the docs say about escaping codepoints.

I have never seen a case when XLSX files (or DOCX) files result in anything but an endless stream of headaches, often headaches developing years down the road when subtle changes occur in the run environment.

Depending on your data, XLSX and DOCX files are almost always overkill in complexity, plus they often tap into all kinds of random crapware in your system such as the Microsoft Windows security policy stuff.

I gave up on any sort of non-text storage and just use CSV or TXT files for data inputs to my games. Escaping still has to happen, but at least you control the bytes.

1 Like