XML and BOM encoding.

Hi, I’m downloading an XML file from a server, saving it locally and then parsing it. When I try to load it, I receive this error:

Text node cannot appear in this state. Line 1, position 1.

Apparently this is to do with BOM encoding as discussed here:



Here is the code I’m using:

	IEnumerator DownloadXML ()

		// If there is an existing file, delete it.
		if (System.IO.File.Exists (localFile)) {
			Debug.Log ("Exists: " + localFile);
			System.IO.File.Delete (localFile);
			print ("Deleted: " + localFile);
		WWW wwwfile = new WWW (remoteFile);
		yield return wwwfile;
		// After it's downloaded.
		print ("File Size : " + wwwfile.bytes.Length);
		print("Cache Location:"+Application.temporaryCachePath);
		// Write to local file.
		System.IO.File.WriteAllBytes (localFile, wwwfile.bytes);
		Debug.Log ("Cache saved: " + localFile);
		Debug.Log ("File downloaded");
		if (System.IO.File.Exists (localFile)) {
			Debug.Log (" file does exist");
		} else {
			Debug.Log (" file does not exist");
	 	ReadXML ();

So I write this file out locally and by loading it in notepad I can see it’s fine. But when I come to load it:

	XmlDocument xmlDoc = new XmlDocument (); // xmlDoc is the new xml document.
		xmlDoc.LoadXml (localFile); // load the file.

I receive the ‘Text node cannot appear in this state. Line 1, position 1.’ error.

Apparently BOM encoding is stored within the first byte (and this causes the issue) so I then tried to strip out the first byte:

		// Create an array with one less element than the file
		byte[] fileWithoutBom = new byte[wwwfile.bytes.Length-1];
		for (int index=1; index<fileWithoutBom.Length; index++)
			fileWithoutBom[index-1] = wwwfile.bytes[index];	

But the error persists. Can anyone help? Thanks!

Thanks in advance for any help

You’re trying to call LoadXml with a file, but it expects the whole string.

So you should do either




because Load expects a filename. See here and here.

First of all the Byte Order Mark (BOM) is not an encoding, it’s a special character which tells the receiving side in which order bytes form integer values (Endianness).

Next thing is in UTF8 the BOM character is made up by 3 bytes, not by one.

To actually fix your problem you should save your file without BOM character. Just open the xml file in Notepad++ and save it as “UTF8 without BOM”.

As alternative, use the text property of your “wwwfile” which will be interpreted as unicode string and the BOM should be represented with a single character. That’s not the case as a byte array since UTF8 / UTF16 / … uses multiple bytes to encode some characters.