Take a string and put spaces between words in that string

I want to convert for example “ParseThisString” into “Parse This String”. How would I be able to do that in the most efficient way possible? Is there some kind of string format command for this or would I have to split the string into different words first then join them together with spaces?

StackOverflow:

    string s=Regex.Replace("ParseThisString", "([A-Z])", " $1", RegexOptions.Compiled);        // using System.Text.RegularExpressions;
3 Likes

@zulo3d gave you a great direct answer for your problem.

I’d like to offer up a unityeditor specific alternative as I’m unsure your exact use (you didn’t necessarily clarify if you only wanted to support camelcase, just that your example implied that).

But since this is Unity, and you may be trying to ‘nicify’ your string in a similar format to how Unity does internally (see: variable names shown in editor). There is an editor-time only available function that does that for the various variable name formats:

3 Likes

Of course we should stress that this is an editor only solution that can not be used at runtime. Just to avoid the disappointment because the reader missed the unityeditor specific part :slight_smile:

edit
ps: When you want to play around with regex expressions, you should check out regex101.com. It’s a really great tool to setup and test regex expressions on the fly and see what groups and matches it will generate on a certain input.

2 Likes

This can’t properly handle inputs such as “UIElement” (would return “U I Element”) or “Array2D” (would return “Array2 D”).

This can handle such cases:

public static string NicifyVariableName(string input)
{
    int length = input.Length;
    switch(length)
    {
        case 0:
            return "";
        case 1:
            return input.ToUpper();
    }
  
    int index = 0;
    int stop = length;

    // skip past prefixes like "m_"
    if(input[1] == '_' && length >= 3)
    {
        index = 2;
    }
    // handle property backing field
    else if(input[0] == '<' && input.EndsWith(">k__BackingField"))
    {
        index = 1;
        stop = length - 16;
    }
    // skip past "_" prefix
    else if(input[0] == '_')
    {
        index = 1;
    }

    var stringBuilder = new StringBuilder();

    // first letter should always be upper case
    stringBuilder.Append(char.ToUpper(input[index]));

    // skipping first letter which was already capitalized
    for(index++; index < stop; index++)
    {
        char @char = input[index];
      
        // If this character is a number...
        if(char.IsNumber(@char))
        {
            // ...and previous character is a letter...
            if(char.IsLetter(input[index - 1]))
            {
                // ...add a space before this character.
                stringBuilder.Append(' ');
                //e.g. "Id1" => "Id 1", "FBI123" => "FBI 123", "Array2D" => "Array 2D"
            }
        }
        // If this chararacter is an upper case letter...
        else if(char.IsUpper(@char))
        {
            // ...and previous character is a lower case letter...
            if(char.IsLower(input[index - 1])) //IsLower returns false for numbers, so no need to check && !IsNumber separately
            {
                // ...add a space before it.
                stringBuilder.Append(' ');
                //e.g. "TestID" => "Test ID", "Test3D => "Test 3D"
            }
            // ...or if the next character is a lower case letter
            // and previous character is not a "split point" character (space, slash, underscore etc.)
            else if(length > index + 1 && char.IsLower(input[index + 1])) //IsLower returns false for numbers, so no need to check && !IsNumber separately
            {
                switch(input[index - 1])
                {
                    case ' ':
                    case '/':
                    case '\\':
                    case '_':
                    case '-':
                        break;
                    default:
                        // ...add a space before it.
                        stringBuilder.Append(' ');
                        // e.g. "FBIDatabase" => "FBI Database", "FBI123" => "FBI 123", "My3DFx" => "My 3D Fx"
                        break;
                }
            }
        }
        // replace underscores with the space character...
        else if(@char == '_')
        {
            // ...unless previous character is a split point
            switch(input[index - 1])
            {
                case ' ':
                case '/':
                case '\\':
                case '_':
                case '-':
                    break;
                default:
                    stringBuilder.Append(' ');
                    break;
            }

            continue;
        }
      
        stringBuilder.Append(@char);
    }

    return stringBuilder.ToString();
}

Proof.

1 Like

Actually the SO question linked by zulo has a regex solution for that as well, in a different answer.

2 Likes

You can also check out this tutorial of mine. It contains a class called SimpleSanitizer, which does what you need. It’s not regex but it’s small (edit: I meant to say for a non-regex solution), customizable, and extendable. You can find it under a spoiler button.

Edit:
It scans the original string one character at a time, but also looks ahead one character, and then uses state machine logic to determine when to insert space, capitalize a letter, or substitute underscore in a new string it’s producing. For capitalizing letters, the underlying state assumes a “shift” bit (like the shift key), so it ‘cleverly’ decides when to “press the shift” (which is then consumed after parsing a letter), and also won’t separate digits from each other, but will separate letters from numbers. A very simple machine that does its job in one go without producing (too much) garbage*, making it suitable for heavy duty operation.

  • Though it could be made even more garbage-friendly if ScannerState was persistent between calls.
2 Likes

GPT-4 generated one that doesn’t rely on LINQ.

public class StringFormatter
{
    public static string AddSpacesToSentence(string text, bool preserveAcronyms = true)
    {
        if (string.IsNullOrWhiteSpace(text))
            return string.Empty;
        
        // This pattern will look for places in the string where a lowercase letter is followed by an uppercase letter and insert a space.
        // The pattern also considers the scenario where uppercase letters are adjacent (considered as acronyms) and optionally prevents adding spaces between them.
        string pattern = preserveAcronyms ? "(?<=[a-z])(?=[A-Z0-9])|(?<=[A-Z0-9])(?=[A-Z][a-z])" : "(?<=[a-z])(?=[A-Z])";
        return Regex.Replace(text, pattern, " ");
    }
}
2 Likes

@Ryiah Great use case for ChatGPT!

One small detail that I think is still missing from that is returning “ID 1” for the input of “ID1”. But GPT-4 was able to quickly remedy that as well when I asked it politely:

public static string AddSpacesToSentence(string text)
{
    const string pattern = "(?<=[a-z])(?=[A-Z0-9])|(?<=[A-Z])(?=[0-9])|(?<=[A-Z0-9])(?=[A-Z][a-z])";
    return Regex.Replace(text, pattern, " ");
}

Note that processing such long regular expressions will probably be quite inefficient - but as long as it doesn’t need to go through hundreds of lines of text in one go, I think it should suffice just fine.

2 Likes

Regex can be optimized by the compiler by using the flag Compiled, which generates optimized IL code for the operation directly instead of interpreting it:

private static readonly SpacesRegex = new Regex("(?<=[a-z])(?=[A-Z0-9])|(?<=[A-Z])(?=[0-9])|(?<=[A-Z0-9])(?=[A-Z][a-z])", RegexOptions.Compiled)
public static string AddSpacesToSentence(string text)
{
    return SpacesRegex.Replace(text, " ");
}

:

1 Like

Even with this optimization, in my experience it can take a long time for complex regular expressions to process 1000+ lines of text, and manually going through the string character-by-character can give orders of magnitude better performance. It probably depends a lot on the particulars of the situation though.

3 Likes

Regex is definitely cool, but I think it’s more suited for a collaborative environment, i.e. someone else develops it, tests it, and shares it with the engineer who is at the moment doing something much more concrete. This is because if a regex is complicated, taking it lightly is not the best idea, and yet development of it can take a whole day, and purge a programming mind of any other context.

For this reason alone, I’m always more open toward simpler and straightforward solutions (unless the regex itself is very simple or it is easy to find an working expression somewhere online, which is again similar to having a dedicated colleague who wasn’t lazy to test it thoroughly with a large amount of data, and then optimize for better performance).

And let’s not talk about maintenance or a new programmer trying to understand the codebase. Regex is very cool, but perhaps too cool i.e. liquid nitrogen cool.

3 Likes