WWW.text string not correct?

Hi there, I’ve been hitting my head against the wall for a few weeks now because of some weird behaviour I can’t quite figure out.

My webplayer has to read a JSON data file from a folder (inside the same server, in a subfolder, so the security sandbox doesn’t complain) getting the file via WWW class and using the JSONObject class to parse the data. I’ve coded it so in the case of the file not being available, it reads some default data from a JSON file from the resources folder. The thing is that when I use the WWW.text property to access the data from the downloaded file, the JSONObject class can’t read it, while whenever it reads the (very same) data from the file in the resources folder (using TextAsset.text property) there’s no problem at all. Both files are UTF-8 encoded.

I’ve narrowed the problem down to the point of WWW.text property vs TextAsset.text property reading files UTF-8 encoded and returning differently encoded strings (or something similar, because I’m lost). The docs on WWW.text say that the contents of the web page must be in UTF-8 or ASCII character set, which they are, but still, the string that I get using WWW.text can’t be read in the parser. The manual on TextAsset says that it can read .json files (which it does), so I really think that the problem may come from reading a UTF-8 encoded file with the WWW class.

The (updated) code:

WWW myWWW = new WWW(Application.dataPath + JSONurl);
yield return myWWW;
string jsonData = "";
if (myWWW.error == null) {
	jsonData = myWWW.text;
} else {
	TextAsset myData = (TextAsset)Resources.Load(JSONFile, typeof(TextAsset));
	jsonData = myData.text;
}
JSONObject json = new JSONObject(jsonData);

Any ideas on why WWW.text could be returning a wrong string while reading from a UTF-8 encoded file?

Thanks in advance.

[UPDATE] I’ve updated the code and done some more tests. The string that receives the data from the WWW.text displays the contents of the file correctly, no matter the encoding of the file (ANSI or UTF-8). Looks like the string returned by WWW.Text and TextAsset.text might be in different encodings and the JSONOBject only accepts the encoding from the TextAsset.text string. I’ll keep working on it.

Finally the solution came from a colleague of mine, who told me that while debugging he noticed that the first three bytes of the WWW.text property were not readable characters. And yes, they were EF BB BF, i.e. the UTF-8 BOM. His workaround (that works at least on webplayer, I haven’t checked other platforms) is as simple as not reading the first three bytes, since it seems that the WWW class includes the BOM in the WWW.text string:

WWW myWWW = new WWW(Application.dataPath + JSONurl);   // UTF-8 encoded json file on the server
yield return myWWW;
string jsonData = "";
if (string.IsNullOrEmpty(myWWW.error)) {
	jsonData = System.Text.Encoding.UTF8.GetString(myWWW.bytes, 3, myWWW.bytes.Length - 3);  // Skip thr first 3 bytes (i.e. the UTF8 BOM)
	JSONObject json = new JSONObject(jsonData);   // JSONObject works now
}

Haven’t checked with other parser or any other situations, but if any of you is experiencing problems with WWW.text you may want to look at the BOM inside your strings.

I’m using JSONObject from the asset store.

but parser doesn’t like the string it receives if it comes from an UTF-8 encoded file

You might get a faster answer by getting in contact with the author. Probably they know whether this is a limitation of the plugin, or could provide a solution.