Iโm trying to use TMPs HasCharacter(char, bool, bool) to check if a font supports an emoji. However, it seems to me that this is not correctly supported since I always get a false, even when I know that it is supported. Example code:
[MenuItem("Tools/IsEmojiSupported _F7")]
public static void IsEmojiSupported()
{
string unicodeString = "\U0001F600"; // represents ๐๏ธ
string notoEmojiFontAssetPath = AssetDatabase.GUIDToAssetPath("f37c4690a2cef44419da40197760dc0e");
var notoEmojiFont = AssetDatabase.LoadAssetAtPath<TMP_FontAsset>(notoEmojiFontAssetPath);
int characterInt = Convert.ToInt32(unicodeString[0]);
bool hasCharacter = notoEmojiFont.HasCharacter(unicodeString[0]);
bool hasCharacters = notoEmojiFont.HasCharacters(unicodeString);
// Logs: Noto Emoji font has character ๐ (55357): False, False
Debug.Log($"Noto Emoji font has character {unicodeString} ({characterInt}): {hasCharacter}, {hasCharacters}");
}
Is this a bug in TMP? I donโt have any issues with HasCharacter for โnormalโ characters.
I guess for the notoEmojiFont.HasCharacter this canโt really work for anything outside the char value range (such as
which is 128517). I think what I would need is something along the lines of HasCharacter(int unicode, bool searchFallbacks, bool tryAddCharacter) to check this. Sadly then the next issue is with the callback for InputField.onValidateInput which seems to split up the char into two values which would also be better to work with one unicode int instead.
This is more tricky then I envisioned and I think it would need some refactoring on TMP side to make this work. I donโt really see a possibility to check if the emoji to draw is actually in the emoji font used through API calls. Am I wrong here?
The following function
public bool HasCharacter(char character, bool searchFallbacks = false, bool tryAddCharacter = false)
only checks if the character is present in the font asset unless the tryAddCharacter parameter is set to true.
As you pointed out, this function only accepts a char type which is UTF16 and thus would not work with a UTF32.
The following function can be used instead which taken in a string where you can specify UTF32 characters such as โ\U0001F600โ
public bool HasCharacters(string text, out uint[] missingCharacters, bool searchFallbacks = false, bool tryAddCharacter = false)
The following function which is internal does work with UTF32
bool HasCharacter_Internal(uint character, bool searchFallbacks = false, bool tryAddCharacter = false)
You could change this but it would require embedding the package in your project.
Adding a new variant of the public functions to take a uint (32 bit character) would certainly make sense.
There are other ways to check if a certain character is present in a font file but let me know if the above proposed solutions work for you.
P.S. UTF32 character can be represented in strings by using the \u followed by 2 Hex pairs for UTF16 and \U followed by 4 hex pairs for UTF32 characters. You can also use surrogate pairs as well.
Thanks a lot for your answer. Iโm not sure if embedding the UnityEngine.UI package is a good idea, since I would guess it changes quite a bit in minor versions (even though it is always at version 2.0.0), right?
I tried using the HasCharacters check together with char.ConvertFromUtf32(), but this still returns false for emojis that are actually supported. Here is the full code:
[MenuItem("Tools/IsEmojiSupported _F7")]
public static void IsEmojiSupported()
{
int unicode = 128512; // ๐
Debug.Log($"Emoji {GetStringFromUnicode(unicode)} ({unicode}) is supported: {IsEmojiSupported(unicode)}");
unicode = 9989; // โ
Debug.Log($"Emoji {GetStringFromUnicode(unicode)} ({unicode}) is supported: {IsEmojiSupported(unicode)}");
unicode = 10084; // โค๏ธ
Debug.Log($"Emoji {GetStringFromUnicode(unicode)} ({unicode}) is supported: {IsEmojiSupported(unicode)}");
unicode = 128517; // ๐
Debug.Log($"Emoji {GetStringFromUnicode(unicode)} ({unicode}) is supported: {IsEmojiSupported(unicode)}");
unicode = 127819; // ๐
Debug.Log($"Emoji {GetStringFromUnicode(unicode)} ({unicode}) is supported: {IsEmojiSupported(unicode)}");
}
private static string GetStringFromUnicode(int unicode)
{
if (unicode < 0 || unicode > char.MaxValue)
{
return char.ConvertFromUtf32(unicode);
}
char character = (char)unicode;
return character.ToString();
}
public static bool IsEmojiSupported(int unicode)
{
if (unicode <= char.MaxValue && unicode >= char.MinValue)
{
// No surrogate pair, use the simpler check
char character = (char)unicode;
foreach (TMP_Asset emojiAsset in TMP_Settings.emojiFallbackTextAssets)
{
if (emojiAsset is TMP_FontAsset emojiFont && emojiFont.HasCharacter(character, false, true))
{
return true;
}
}
}
else
{
// The unicode value is a combination of two characters (surrogate pair), use string for checks
string characterString = char.ConvertFromUtf32(unicode);
foreach (TMP_Asset emojiAsset in TMP_Settings.emojiFallbackTextAssets)
{
if (emojiAsset is TMP_FontAsset emojiFont &&
emojiFont.HasCharacters(characterString, out uint[] _, false, true))
{
return true;
}
}
}
return false;
}
And the result is
Emoji ๐ (128512) is supported: False
Emoji โ
(9989) is supported: True
Emoji โค (10084) is supported: True
Emoji ๐
(128517) is supported: False
Emoji ๐ (127819) is supported: False
So the two single chars with HasCharacter return true, but the combined characters with HasCharacters check return false.
If I get this to work I can properly handle the TMP_Text.OnMissingCharacter where I get the unicode value as an int. However, this does not solve my second issue where I want to make sure users only enter characters that are allowed with the callback TMP_InputField.onValidateInput which has a char addedChar as a parameter - Is there a simple solution to that issue as well (e.g. where I get a callback for a unicode character itself)?
I just tested the following which worked as expected.
// Simple test
TextComponent.font.HasCharacters("\U000107B5", out _, false, true)
Since the above function takes in a string which makes it possible to pass UTF32 character, this should work correctly for you.
You should be able to implement something that uses OnSubmit or perhaps a custom validator.
With regards to custom validation, check out the following thread which should contain useful information.
In the custom validator, you will likely need to check if the first character is a high surrogate and if so accept it and then check the 2nd character to validate the full UTF32. If the combination of the two is invalid, the reject the input.
P.S. It has a been a long while since I look at this part of the input field and some of those functions. As it stands, it is clear that making some improvements to HasCharacter and Input Field validation to better handle UTF32 would be nice.
Iโll take a deeper look over the weekend and get back to you.
I just tested the following which worked as expected.
Huh, that is very surprising to me. I checked again for your string and other emojis for different emoji font assets and always got a false on hasCharacters for anything with a surrogate:
[MenuItem("Tools/IsEmojiStringSupported _F7")]
public static void IsEmojiStringSupported()
{
CheckForEmojiString("\U00002705"); // โ
CheckForEmojiString("\U0001F600"); // ๐
CheckForEmojiString("\U0001F34B"); // ๐
CheckForEmojiString("\U000107B5"); // Example from Stephan
}
private static void CheckForEmojiString(string characterString)
{
foreach (TMP_Asset emojiAsset in TMP_Settings.emojiFallbackTextAssets)
{
uint[] missingCharacters = Array.Empty<uint>();
if (emojiAsset is TMP_FontAsset emojiFont &&
emojiFont.HasCharacters(characterString, out missingCharacters, false, true))
{
Debug.Log($"{emojiAsset.name} includes '{characterString}'");
}
else
{
Debug.Log($"{emojiAsset.name} does not include '{characterString}', missing characters: {string.Join(", ", missingCharacters)}");
}
}
}
Result:
NotoColorEmoji Color includes โ
โ
NotoColorEmoji Color does not include โ
โ, missing characters: 55357, 56832
NotoColorEmoji Color does not include โ
โ, missing characters: 55356, 57163
NotoColorEmoji Color does not include โโกโ, missing characters: 55297, 57269
I checked it both when the char was already used by the font (which is set to dynamic), and when it is not yet generated. I also ran the script in playmode and in edit mode, which also didnโt make any difference.
In the custom validator, you will likely need to check if the first character is a high surrogate and if so accept it and then check the 2nd character to validate the full UTF32. If the combination of the two is invalid, the reject the input.
Yeah, thatโs what I thought about too, but since it is a bit cumbersome to implement I was hoping that there was a builtin solution for unicode validation instead of char validation.
As it stands, it is clear that making some improvements to HasCharacter and Input Field validation to better handle UTF32 would be nice.
Yep, thatโs what I would would wish for. I didnโt run into any problems / issues with the methods when working with โnormalโ characters, but in the UTF32 world they are a bit outdated.
Just for testing, can you try something like this
private void Awake()
{
Debug.Log(TextComponent.font.HasCharacters("\U0001F34B", out _, false, true));
}
using some script assigned to some GameObject that has a public property referencing a TMP_Text component?
I am curious to see if the issue you are running into could come from the fallback search.
UPDATE:
Looks like the HasCharacters function below does not correctly handle surrogate pairs and was also incorrectly returning true in my tests.
public bool HasCharacters(string text, out uint[] missingCharacters, bool searchFallbacks = false, bool tryAddCharacter = false)
I tested what you suggested, but this still spits out false. Here you can find a demo. The code for the check is this one:
using TMPro;
using UnityEngine;
public class CheckHasEmoji : MonoBehaviour
{
[SerializeField]
private TMP_Text[] textsToCheck;
private void Start()
{
LogHasEmoji();
}
public void LogHasEmoji()
{
string emojiToCheck = "\U0001F34B"; // ๐
foreach (TMP_Text text in textsToCheck)
{
Debug.Log($"Text {text.name} with font asset {text.font.name} supports emoji {emojiToCheck}: {text.font.HasCharacters(emojiToCheck, out _, false, true)}");
}
}
}
It checks for one TMP Text that has the emoji font directly applied and one that uses LiberationSans and therefore the fallback. Also interesting in that regard: Apparently also he emoji fallback is used for the text that has the noto emoji font applied (You can see this when switching the emoji fallback through the buttons.
@Stephan_B Ah, just noticed your edit now - Would you like me to file a bug report/support ticket?