problem converting text with unescape to text with accents

Hi guys
I need help to convert this type of string :

La cath\u00e9drale Notre-Dame de Paris a \u00e9t\u00e9 reproduite dans les moindres d\u00e9tails pour ce point’n click particuli\u00e8rement reposant." JeuxVideo.com\n\n"L\u2019application est une jolie d\u00e9couverte! Elle vous permettra d\u2019apprendre de mani\u00e8re agr\u00e9able et p\u00e9dagogique une partie de l\u2019histoire de la ville de Paris.\

Anyone?

I tried :

text = text.Replace(“\u2019”, “'”);

but I got an error : BCE0044: unexpected char: ‘u’.

"" is an escape character; if you want the literal value, use “\”. (Can’t you just get the text in UTF-8 so you don’t have to do this in the first place though?)

–Eric

Well it’s a text I get from internet using WWW.

http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/wa/wsLookup?id=451763560&country=fr

even the \n are written

Is there a way to convert it automatically ?

help needed!

is www.text unicode?
can’t find a way to convert \n to return
see my previous post, anywone knows a way to convert the text?

I believe Eric means that it should be:

text = text.Replace(“\u2019”, “'”);

There’s probably a better way to convert escaped unicode to regular text, but if that’s really the only character you need converted, this is the most efficient way. I don’t believe Unity supports Unicode, though, so if you manage to convert to Unicode, you’ll just have to convert again to something Unity supports.

yes but how to convert the \n codes?

text = text.Replace(“\n”, ???);

Do you want them to be linebreaks?

text = text.Replace(“\n”, System.Environment.NewLine);

Hi it seems very long to replace all the unescape chars, see this table : Unicode Character 'LATIN SMALL LETTER A WITH CIRCUMFLEX' (U+00E2)

I tried the code bellow but doesn’t seems to have any effects on the unescapes :

Any idea ?

Also tested with this, but does not work :

Always have this damn \uXXXX : “Recevez des \u00e9motic\u00f4nes sur votre \ue00a appareil! Exprimez vous autrement et de mani\u00e8re cr\u00e9ative!”

Anyone can help me to convert this text automatically ? (see my 2 previous posts)

Anyone???

I would loop through every instance of “\u”, get the 4 characters proceeding that, use int.Parse and char.ConvertFromUtf32() to get the actual character. Tedious, but should work.

Hi there I got the first \u value : \u00c7

when i put the “00c7” string in int.parse I got an error : “Input string was not in the correct format”

here is my code :

while (text.Contains("\\u"))
	{
		// Look for the system tag
		var tSystemStart : int = text.IndexOf("\\u");
		var tSystemEnd : int = tSystemStart+6;
		if ((tSystemStart > -1)  (tSystemEnd > tSystemStart)) {
			
			// Strip the system start/end elements
			var tStripStart : int = tSystemStart + 2;
			var tStripLength : int = tSystemEnd - tStripStart;
			var aString = text.Substring(tStripStart, tStripLength);
			
		}
		
		Debug.Log (aString); // -> Returns 00c7 OK
		var aInt = int.Parse("\\u"+aString); // ERROR HERE !

// I also tried : var aInt = int.Parse(aString); // Without "\\u" same ERROR HERE !

		Debug.Log (aInt);
		
	break; // not finished code

	
	}

Found the solution here (pfiew not easy to find)

import System.Globalization;

var aInt = System.Int32.Parse( aString, NumberStyles.AllowHexSpecifier );

Thanks to all for help :sunglasses:

Final code
Enjoy!

import System.Text;
import  System;
import System.Globalization;


while (textStore.Contains("\\u")) 
	{
		// Look for \\uXXXX chars
		var tSystemStart : int = textStore.IndexOf("\\u");
		var tSystemEnd : int = tSystemStart+6;
		if ((tSystemStart > -1)  (tSystemEnd > tSystemStart)) {
			
			// Strip the system start/end elements
			var tStripStart : int = tSystemStart + 2;
			var tStripLength : int = tSystemEnd - tStripStart;
			var aXMLString = textStore.Substring(tStripStart, tStripLength);
			
		}
		
		//Debug.Log (aXMLString);
		
		var aXmlInt = System.Int32.Parse( aXMLString, NumberStyles.AllowHexSpecifier ); // String to Exadecimal
		
		//Debug.Log (aXmlInt);
		
		//Debug.Log(char.ConvertFromUtf32(parseInt(aXmlInt))); // EXADECIMAL TO INT
		
		textStore = textStore.Replace("\\u"+aXMLString, char.ConvertFromUtf32(parseInt(aXmlInt))); // replace 
	if (!textStore.Contains("\\u")) break;
	}

Hi, after a few days I detected a new error using the previous script with invalid unicode characters :

textStore = textStore.Replace("\\u"+aXMLString, char.ConvertFromUtf32(parseInt(aXmlInt)));

ERROR :

it appears with some unusual characters such as : \ud83c …

Is there an easy way to detect those invalid unicode characters and replace them to avoid the error on the code line above?

If all you’re looking to do is avoid erroring out, why not just catch the exception and replace with an empty string?