Interesting Unityscript/C# difference regarding UTF-8

OK, I finally figured out why some people have insisted that only UTF-16 encoding works for scripts when using characters outside the ASCII range, when UTF-8 has worked fine for me. Namely:

function Start () {
	var testString = "©®";
	Debug.Log ("Test1: " + testString);
}

vs.

using UnityEngine;

public class Test : MonoBehaviour {
	void Start () {
		var testString = "©®";
		Debug.Log ("Test2: " + testString);
	}
}

The Unityscript code prints “©®” regardless of UTF-8 or UTF-16 encoding. The C# code only prints “©®” when using UTF-16, otherwise it prints “???” if using UTF-8. So…yeah. Not sure why, but I’ll keep that in mind from now on.

–Eric

Thank u very much for giving such a usefull thing in unity3d.

This isn’t a Unity3D thing. It’s a C# thing.

C# uses a UTF-16 encoding by default. (.NET uses UTF-16 internally too.)

Javascript.Net seems to default to UTF-8 in places—presumably due to its history as a scripting language for web browsers—which is more backwards-compatible with the older ASCII / ANSI character sets.

Both .NET and Mono support ASCIIEncoding, UTF8Encoding, and other classes for converting between different encoding standards, so they’ll work in both Javascript.Net and C#. (And Boo. Mustn’t forget Boo.)