Are user generated strings safe to print?

Users can have nicknames that will be displayed to other users.
The only validation I do is maximum 20 characters.
I put these strings in a Unity UI Text component (and will be displayed to other users).
Is there some harm this could do?
Is unity in danger from user input, like browsers (and databases) are?
I noticed unity parses html so they could inject that, but it’s not destructive, I guess.
Anything else i should be worried about that could break the game or attack other users?

*later edit: I decided to take Fattie’s advice. I guess you can’t be too paranoid about user input.
I only allowed alphanumeric, " " and “-” in both unity and php.

    public string transformIntoAlphaNumeric(string taintedString)
    {
        Regex rgx = new Regex("[^a-zA-Z0-9 -]");
        return rgx.Replace(taintedString, "");
    }

Like others have said a string entered in a text field can’t possible do any serious harm to you Unity application from Unity’s point of view. The worst thing that might happen is that using of some strange unicode characters results in some visual glitches but nothing that could break the game.

However it depends on how you use the string otherwise. For example if you store it or save / transmit it via JSON / XML / … you should make sure that no special characters are used in those strings. Most of these frameworks however do the escaping out-of-the-box. So you can literally use any string, but you should make sure that however you use the string you don’t expect some kind of “special format”.

For example concat several playernames into one string seperated with “;” will fail if a name contains a “;”.

I wouldn’t be too restrictive what the user can input. Fattie’s suggestion to limit to a-zA-Z is quite unparctical. Any user that doesn’t live in an english speaking world will hate your game. There is more than ASCII.

It’s all a matter of what you want to allow in your game. For example if you have some sort of faction / clan management and if a user is in a clan the clan tag is displayed in square brackets before the name, you probably don’t want the user to be able to include square brackets in their name.

I personally would filter out: newline, carriage return, tab, and other special characters (everything below unicode \u0020)

I just have written a small helper class which allows you to create your own “filter rules”. It also has a “CheckForProblems” method which returns a list of things which aren’t right in the passed string. It lists every exclude item only once. So if there are 3 newline characters in the string it will only report the first one. The class is designed to avoid garbage creation due to the filtering. It uses a StringBuilder to actually perform the filtering. The “Problem” struct should also avoid garbage if you cache the List.

edit
Since you mentioned SQL injection i think i should add this: Things like SQL injection is only a problem when you don’t escape your string properly based on your usecase.

Example:

Imagine you get a userName from a webform and want to create an SQL query with it. I don’t use php so my example is in C# but the idea is the same:

string userName; // what the user typed in
string sqlQuery = "select * from users where username='"+userName+"'";

Some people think since they enclose the string in single quotation marks it doesn’t matter what the user enters. That’s actually partially true. As long as the text is quoted it doesn’t matter. However the user could type a single quote “himself”. That would terminate the quote and allows him to add something to the query.

userName = "';drop database;select '";

such a string would kill the whole database if multi queries are allowed by your system. The server interprets them as seperate queries:

select * from users where username=''
drop database
select ''

So in this case it would be enough to either remove single quotes from the string or escape it.

Most systems that rely on a certain syntax (SQL, URL, RegEx, …) have dedicated escape and unescape functions, They ensure that all special character for that system are escaped properly and the string is safe to be treated as “text”.

Some examples: real_escape_string, EscapeUriString RegEx.Escape

Short answer: Unity should not have issues with any strings the user can enter arriving in a text box, but there’s no harm in being safe - especially if of a problem will have wide reaching consequences (broken saves, or as in your case, online data affecting other users).

Longer answer…

This sort of question ultimately comes down to a question of how much do you trust Unity (or other software) to have issues with your strings, vs what are the actual consequences?

In theory, Unity shouldn’t crash or do anything nasty when you give it a string the user entered, but there’s always a chance there’s a bug - or there may be a bug later but not now, or somehow even a bug that is affected by the OS or breaks your server code somehow! You need to decide whether it’s worth risking the bug.

The question to ask is - if my user accidently or deliberately created a ‘dodgy’ string, and it caused the game to die somehow, would it:

  • Be an entirely temporary thing - game crashes, reboots, user continues and doesn’t do it again
  • Get stored in a save game or something - i.e. game crashes, reboots, game crashes again, angry user
  • Get uploaded to a server so other users see it - game crashes for EVERYONE!

Maybe in the first case you wouldn’t worry too much. Certainly if you’re uploading any form of data (as with your question) it should be mega validated, as a problem caused by 1 user could affect 1000s of people!

“The only validation I do is maximum 20 characters”

this is quite wrong. You MUST limit to to (say) a-zA-Z only

“Are user generated strings safe to print?”

The answer is NO, it is absolutely NOT safe to print such strings (or to let them exist).

Note too that setting aside the “security” issue. Of course, you want to limit the characters. Think about say twitter, facebook or any online system where you have a nickname – of course, there is some limited choice of characters, you can’t use “silly” characters. Note that, of course, you should actually limit it “as it is being typed”. So, you cannot even type “%” or whatever. Hope it helps.

Unity uses only five “custom” html tags.
http://docs.unity3d.com/Manual/StyledText.html

So you can turn off the rich text box.
Also (as mentioned above) test ", ; /*, /n /t ".
I don’t think it is really dangerous to have user generated strings.