After beating my head against the wall trying to figure out how to read a UTF-16 file (saved as Unicode from WordPad) in PHP so I can process foreign characters, I’m thinking of trying it in Unity instead since PHP has a convoluted method of dealing with Unicode. Is there a straightforward way of reading UTF-16 files and changing foreign characters to my own code and then saving the result to a normal text file? I just need to read a file that includes foreign characters, convert them to a format in normal text that I can later process in a different program, then save the file as normal text.
I’m not sure I understand why would C# be simpler than this?
https://www.php.net/manual/en/function.mb-convert-encoding.php
File.WriteAllLines(file, lines, Encoding.Unicode);//with a BOM
Normally for foreign character people use UTF8, though. It can encode entire unicode range, and more compact for ACII text.
Because you also need to use special string functions, and (according to one post) the PHP file itself needs to be encoded the same way, and lots of other voodoo. When I tested out a simple bit of code, the results didn’t make any sense.
Another weird problem: I decided to just use BabelPad’s batch replace function to swap all the Unicode characters for my own ASCII code that I can later use to reconstruct the correct characters, but I ran into a strange problem: the batch replace option only works if I copy-and-paste Unicode characters directly from the file I’m trying to modify (after loading it into BabelPad) rather than from a list of the same characters copy-and-pasted from a different file that has a list of characters to swap. The odd thing is that the latter shows up fine in BabelPad’s batch replace input field as if BabelPad is interpreting them correctly, but the replace option doesn’t do anything. My only explanation is that BabelFish apparently converts the file I’m modifying from UTF-16 to UTF-8 format whereas if I copy from the list of swappable characters (a WordPad file which is also in UTF-16) the copy-and-paste function doesn’t convert it to UTF-8 and hence it doesn’t work? Is that the correct interpretation? Either way, it’s surreal that there are so many endless headaches just to deal with foreign characters.
There’s a good chance that you’re missing BOM somewhere.
https://en.wikipedia.org/wiki/Byte_order_mark
Also, inventing custom ascii codes for unicode characters is a bad idea. Just use UTF8.
I found that if I copy and paste one of the Unicode characters from the file to my list of characters to swap, then it works; but not the other way around, in fact I can’t even use BabelPad to search for any of the unicode characters unless I copy them from that specific file. E.g. if I search for “Š” then it can’t find it, but if I copy the same character from the file I’m searching then it finds it.
It means your “List” might not correctly store data within clipboard.
And the character you search for might have a different code comapred to what’s in the file.