Needs Unicode support

Hello !
I currently using Textmesh Pro and so far so good !
But I came from Asia, and we are wondering to know…
Is there any possible to add Unicode / Asian CJK Fonts support into the Textmesh Pro ?

Anyway, Thanks for your hard work for all of this ! :slight_smile:

TextMesh Pro does support full UTF32 / Unicode. When you create your font assets, you have to include the characters that you wish to use. For Asian languages whose character sets are larger, you might also need to create a few fallback font assets.

See the Font Asset Creation video which covers how to include characters for any languages as well as important options like “Characters from File”.

You should also watch the video about the Font Fallback as well as this one to allow combining symbols in the same object such as FontAwesome.

Lastly make sure that you are familiar with Material Presets as this is also an important part of using TMP and avoiding resource duplication.

Here is some additional information that I wrote previously about handling CJK and localization in general.


In terms of how to handle the mapping of these characters, here are my suggestions.

I recommend creating a Primary SDF Font Asset which will contain all the known / used Chinese characters in the project. By known I mean those used in your menu and text components but not those who might potentially come from user input. This will result in an SDF Font Asset which most likely contains less than 1000 characters. (P.S. I would actually love to know what you end up with in terms of character count.) To input the list of Characters, I always use “Characters from File” since the text used in your project should already exist in some text file (encoded as Unicode).

Next, I would create 3 additional Fallback font assets which would contain the remaining 8105 characters not already present in your Primary Font Asset. The first Fallback would contain the first 3500 from the Table of General Standard Chinese Characters minus those already in your Primary. The 2nd would contain the 3000 minus again those in the Primary and lastly the third would contain then 1805 minus those in the Primary.

This will give you (1) Primary SDF Font Asset to which you will assign these (3) Fallback SDF Font Assets.

When creating these Fallback Font Assets, the sampling point size and padding and texture resolution do not need to be the same.

In order for the visual appearance of things like Outline, Shadow, etc to be consistent, you have to maintain the same Ratio of Sampling Point Size to Padding. So if the Primary is using a Sampling Point Size of 120 with padding of 10. Then you Fallbacks could be using a Sampling of 60 with padding of 5. You can control the sampling point size by instead of using Auto Sizing on Point Size, you set a value manually.

So for all the known text, I usually maximize the sampling point size since I know these characters are contained in my project and I want them to always look great. However, for the Fallbacks where only a few might be used in the context of user input which might not be visible on screen much, I use lower quality settings which allows me to save on texture size / resources.

2 Likes

WOW!
Appreciate your kind assistance!
I’ll give it a try.

Hey @Stephan_B ,

hope you don’t mind I add a round of questions to the Unicode support here, since I’m struggling to find accurate documentation:

  • Does this apply to all platforms, including android/ios?

  • I read in one of your recent posts that gsub tables are not supported yet. Does this not impact asian fonts?

  • How does this apply to input fields, rather than text component / render?

  • Specifically, does user input work for codepoints beyond Basic Multilingual Plane? I believe Unity currently only supports BMP, but I’m not sure about TMP input field.

Thank you!
Rsam.

This apply to all platforms.

Freetype which Unity and TMP use to raster glyphs, does not provide access to the GPOS and GSUB tables which contain “Font Features” which includes among other things ligatures, diacritical marks, glyph substitutions, kerning, etc.

Although most languages use some font features like kerning and some other stuff, languages like Arabic, Thai, Bengali, etc rely heavily on these features. Asian languages like most Latin languages don’t rely on that as much. Regardless, support for Font Features is planned for the Integrated version of TMP.

TMP currently supports the full range of Unicode. UTF16 characters can be accessed with \u03A9 (2 hex paris) while UTF32 is \U0001F600 (4 hex pairs).

Strings in C# are 16 bit so you also have to use \u or \U or surrogate pairs to access UTF32 characters.

Although we can access the full Unicode range in strings or editor input field using the information above, the Text Input Field relies on the Event Class in Unity which is required to process keyboard input. Currently and depending on the platform, UTF32 input doesn’t always work. Some additional work will be required to update classes like the Event Class to make sure we get the correct Unicode input on all platforms.

Is it possible to add this integration to the roadmap? https://unity3d.com/unity/roadmap

I am sure it will get added at some point as besides supporting current TMP users, this is my primary focus.

Hi,
Thank you for specific explanation about the problem of importing Asian languages to Unity.

But some activities have been done on importing RTL Asian languages like Arabic/Persian to Unity e.g. UPersian (by ElectroGryphon) which is based on ArabicSupport for Unity (by Konash).
They nicely have imported RTL support to Unity and the asset performance regarding their Typographic Ligature is almost perfect.

Could you please let me know if there is any way to combine e.g. UPersian with TMP?

regards,

This is something that the author(s) of UPersian could certainly explore.

I Hope they would, although I also hope TMP develop RTL features as well.

Thanks,

TMP has basic RTL support but does not currently support glyph re-ordering which UPersian does.

It looks like UPersian could be used like the old Arabic asset as described on the TMP user forum. See the following thread / post.

Native support for glyph re-ordering as well as OpenType font features is planned for the new text system that will eventually replace TextMesh Pro.

Wow… excelent!
Thank you dear Stephan for your reference to :
http://digitalnativestudios.com/forum/index.php?topic=462.msg8705#msg8705

The ReverseText function in the above thread was the missing task.
The reverse flow of Persian text (bottom-up) became OK when reversing every each of the chars in the string in each paragraph.

Thanks for your great support

Hi,

I seem to have a problem with some UTF32 characters. Many hours of trial and error, forum reading and trying again didn’t solve it, so here it goes.

The problematic characters are the following: (U+27607), (U+20089), (U+201A2), (U+20086), (U+20087). The font file does contain these and yes, the have the right unicode values. First, I tried adding Characters from File. This is the glyph info output:

Characters packed: 0/9
Missing Characters

ID: 55389 Hex: D85D Char [í¡]
ID: 56839 Hex: DE07 Char [í¸‡]
ID: 55360 Hex: D840 Char [í¡€]
ID: 56457 Hex: DC89 Char [í²‰]
ID: 56738 Hex: DDA2 Char [í¶¢]
ID: 56454 Hex: DC86 Char [í²†]
ID: 56455 Hex: DC87 Char [í²‡]
ID: 13 Hex: D Char [ ]
ID: 10 Hex: A Char [ ]

It says 0/9 and also there are these Hex values which are from the Low Surrogate Area (https://www.unicode.org/charts/PDF/UDC00.pdf). If I’m getting this right this method reads only UTF16 codes and splits the 32 bit characters into their surrogate pair codes.

But when I try to add a character by Unicode Range (Hex) as in \U000201A2 the glyph info tells me “Characters packed: 0/0”. Sooo what am I doing wrong? Please help me!

Thanks in advance!

Can you provide me with a link to this font file?

Sure: https://www.dropbox.com/s/0cyii33fdhr7o9r/MaruMissingkit2.ttf?dl=0

Thanks for the quick response!

Thank you for providing the source font file.

Here are the settings that I used to create a font asset that contains these glyphs.

This is the hex character sequence that I entered

A4,2E85,2E89,2E8D,2E96,2E98,2EA1,2EA3,2EA8,2EAD,2EB9,2EBE,2EC2,2ECF,4491,4EBC,5315,20086,20087,20089,201A2,27607

This font only contains 25 glyphs as you can see in the image below.

Wow, thanks! It seems that I added the hex range in the wrong format. Somewhere I read that it should start with \U… Or is that for the Unity editor?

Also, just out of curiousity, why didn’t the characters from file work?

Anyway, thanks for the instant response. Your help is greatly appreciated!

When trying to reference a UTF16 or UTF32 character in the text, you need to use \uFFFF for UTF16 and \UFF00FF00 for UTF32 which is the standard conversion for how to reference those in strings in C#.

BTW: This information is covered in the Font Asset Creation video.

Oh, I see now what I got wrong. Thank you again!

I am trying to create characters to support Russian Language. I tried with every combination. It shows all the characters in missing ones.
When i use same font on normal text then i am able to see russian characters. Please how to fix it.