.NET - How can you split a "caps" delimited string into an array?

Question

How do I go from this string: "ThisIsMyCapsDelimitedString"

...to this string: "This Is My Caps Delimited String"

Fewest lines of code in VB.net is preferred but C# is also welcome.

Cheers!

What happens when you have to deal with "OldMacDonaldAndMrO'TooleWentToMcDonalds"? — Grant Wagner, Sep 30 '08 at 22:09
It's only going to see limited use. I'll mainly just be using it to parse variable names such as ThisIsMySpecialVariable, — Matias Nino, Sep 30 '08 at 22:18
This worked for me: `Regex.Replace(s, "([A-Z0-9]+)", " $1").Trim()`. And if you want to split on each capital letter, just remove the plus. — Mladen B., Mar 25 '19 at 20:17

Markus Jarderot · Accepted Answer · 2017-03-29T11:47:08.960

187

I made this a while ago. It matches each component of a CamelCase name.

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
"camelCase" => ["camel", "Case"]

To convert that to just insert spaces between the words:

Regex.Replace(s, "([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))", "$1 ")

If you need to handle digits:

/([A-Z]+(?=$|[A-Z][a-z]|[0-9])|[A-Z]?[a-z]+|[0-9]+)/g

Regex.Replace(s,"([a-z](?=[A-Z]|[0-9])|[A-Z](?=[A-Z][a-z]|[0-9])|[0-9](?=[^0-9]))","$1 ")

edited Mar 29 '17 at 11:47

answered Sep 30 '08 at 22:59

Markus Jarderot

86,735
21
136
138

1

CamelCase! That's what it was called! I love it! Thanks much! – Matias Nino Sep 30 '08 at 23:27
21

Actually camelCase has a leading lowercase letter. What you're referring to here is PascalCase. – Drew Noakes Feb 12 '09 at 14:05
12

...and when you refer to something that can be "camel case" or "pascal case" it is called "intercapped" – Christopher May 06 '10 at 16:16
Doesn't split "Take5" which would fails my use case – PandaWood Mar 29 '17 at 00:50
1

@PandaWood Digits was not in the question, so my answer did not account for them. I've added a variant of the patterns that accounts for digits. – Markus Jarderot Mar 29 '17 at 11:50
1

i have to correct bad info. CamelCase does not have to start with a lower-case letter. It can start with either. PascalCase is literally, by definition, Upper CamelCase. – John Lord Jan 01 '21 at 00:35
I would change the last part from `[0-9](?=[^0-9])` to `[0-9](?=[A-Za-z])` so that if you have a space after a number it won't double the space. (Sure, you're not supposed to have a space in a camel case string, but that makes it more general and accept complete sentences which contain camel case expressions). – LionGod8 Feb 23 '23 at 03:26

Wayne · Answer 2 · 2008-10-01T01:18:16.770

41

Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")

edited Oct 01 '08 at 01:18

answered Sep 30 '08 at 22:14

Wayne

38,646
4
37
49

This is the best solution so far, but you need to use \\B to compile. Otherwise the compiler tries to treat the \B as an escape sequence. – Ferruccio Oct 01 '08 at 00:07
Nice solution. Can anyone think of a reason that this shouldn't be the accepted answer? Is it less capable or less performant? – Drew Noakes Aug 25 '10 at 00:34
8

This one treats consecutive caps as separate words (e.g. ANZAC is 5 words) where as MizardX's answer treats it (correctly IMHO) as one word. – Ray Dec 28 '11 at 10:50
2

@Ray, I'd argue that "ANZAC" should be written as "Anzac" to be considered a pascal case word since it's not English-case. – Sam Jun 20 '14 at 04:16
@Sam ANZAC is an Acronym for Australia (and) New Zealand Army Corps so should be in all caps. – Neaox Aug 05 '14 at 04:20
1

@Neaox, in English it should be, but this isn't acronym-case or normal-english-case; it's caps-delimited. If the source text should be capitalised the same way that it is in normal English, then other letters shouldn't be capitalised either. For example, why should the "i" in "is" be capitalised to fit the caps-delimited format but not the "NZAC" in "ANZAC"? Strictly speaking, if you interpret "ANZAC" as caps-delimited then it is 5 words, one for each letter. – Sam Aug 05 '14 at 04:32
This still fails on "Take5" and converts "Arg20" to "arg 20" - fails 3 out of my 4 tests – PandaWood Mar 29 '17 at 01:03
@Neaox The [official capitalization guidelines for C#](https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/capitalization-conventions) say that even acronyms (and other initialisms) should be lower case after the first letter. It gives the example of `HtmlTag`, even though HTML would be upper case in every situation ANZAC would be. However, it does make an exception for two-level acronyms, giving `IOStream` as an example (for IO Stream). – Arthur Tacca Jan 07 '19 at 11:49

score 21 · Answer 3 · answered Nov 15 '08 at 00:31

21

Great answer, MizardX! I tweaked it slightly to treat numerals as separate words, so that "AddressLine1" would become "Address Line 1" instead of "Address Line1":

Regex.Replace(s, "([a-z](?=[A-Z0-9])|[A-Z](?=[A-Z][a-z]))", "$1 ")

answered Nov 15 '08 at 00:31

JoshL

10,737
11
55
61

2

Great addition! I suspect not a few people will be surprised by the accepted answer's handling of numbers in strings. :) – Jordan Gray Nov 12 '12 at 14:37
I know it's been almost 8 years since you posted this, but it worked perfectly for me, too. :) The numbers tripped me up at first. – Michael Armes Aug 26 '16 at 21:05
The only answer that passes my 2 outlier tests: "Take5" -> "Take 5", "PublisherID" -> "Publisher ID". I want to upvote this twice – PandaWood Mar 29 '17 at 01:08

score 18 · Answer 4 · edited Nov 28 '18 at 20:58

18

Just for a little variety... Here's an extension method that doesn't use a regex.

public static class CamelSpaceExtensions
{
    public static string SpaceCamelCase(this String input)
    {
        return new string(Enumerable.Concat(
            input.Take(1), // No space before initial cap
            InsertSpacesBeforeCaps(input.Skip(1))
        ).ToArray());
    }

    private static IEnumerable<char> InsertSpacesBeforeCaps(IEnumerable<char> input)
    {
        foreach (char c in input)
        {
            if (char.IsUpper(c)) 
            { 
                yield return ' '; 
            }

            yield return c;
        }
    }
}

edited Nov 28 '18 at 20:58

jpmc26

28,463
14
94
146

answered Sep 30 '08 at 22:59

Troy Howard

2,612
21
25

To avoid using Trim(), before the foreach I put: int counter = -1. inside, add counter++. change the check to: if (char.IsUpper(c) && counter > 0) – Outside the Box Developer Jun 29 '17 at 23:23
This inserts a space before the 1st char. – Zar Shardan Oct 04 '17 at 14:23
I've taken the liberty of fixing the issue pointed out by @ZarShardan. Please feel free to roll back or edit to your own fix if you dislike the change. – jpmc26 Nov 28 '18 at 20:59
Can this be enhanced to handle abbreviations for example by adding a space before the last uppercase in a series of uppercase letters e.g **BOEForecast** => **BOE Forecast** – Nepaluz Jul 19 '19 at 11:38

score 12 · Answer 5 · answered Sep 30 '08 at 22:13

12

Grant Wagner's excellent comment aside:

Dim s As String = RegularExpressions.Regex.Replace("ThisIsMyCapsDelimitedString", "([A-Z])", " $1")

answered Sep 30 '08 at 22:13

Pseudo Masochist

1,927
14
12

Good point... Please feel free to insert the .substring(), .trimstart(), .trim(), .remove(), etc. of your choice. :) – Pseudo Masochist Oct 03 '08 at 22:40

Dan Malcolm · Answer 6 · 2016-02-25T22:23:07.110

I needed a solution that supports acronyms and numbers. This Regex-based solution treats the following patterns as individual "words":

A capital letter followed by lowercase letters
A sequence of consecutive numbers
Consecutive capital letters (interpreted as acronyms) - a new word can begin using the last capital, e.g. HTMLGuide => "HTML Guide", "TheATeam" => "The A Team"

You could do it as a one-liner:

Regex.Replace(value, @"(?<!^)((?<!\d)\d|(?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z]))", " $1")

A more readable approach might be better:

using System.Text.RegularExpressions;

namespace Demo
{
    public class IntercappedStringHelper
    {
        private static readonly Regex SeparatorRegex;

        static IntercappedStringHelper()
        {
            const string pattern = @"
                (?<!^) # Not start
                (
                    # Digit, not preceded by another digit
                    (?<!\d)\d 
                    |
                    # Upper-case letter, followed by lower-case letter if
                    # preceded by another upper-case letter, e.g. 'G' in HTMLGuide
                    (?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z])
                )";

            var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled;

            SeparatorRegex = new Regex(pattern, options);
        }

        public static string SeparateWords(string value, string separator = " ")
        {
            return SeparatorRegex.Replace(value, separator + "$1");
        }
    }
}

Here's an extract from the (XUnit) tests:

[Theory]
[InlineData("PurchaseOrders", "Purchase-Orders")]
[InlineData("purchaseOrders", "purchase-Orders")]
[InlineData("2Unlimited", "2-Unlimited")]
[InlineData("The2Unlimited", "The-2-Unlimited")]
[InlineData("Unlimited2", "Unlimited-2")]
[InlineData("222Unlimited", "222-Unlimited")]
[InlineData("The222Unlimited", "The-222-Unlimited")]
[InlineData("Unlimited222", "Unlimited-222")]
[InlineData("ATeam", "A-Team")]
[InlineData("TheATeam", "The-A-Team")]
[InlineData("TeamA", "Team-A")]
[InlineData("HTMLGuide", "HTML-Guide")]
[InlineData("TheHTMLGuide", "The-HTML-Guide")]
[InlineData("TheGuideToHTML", "The-Guide-To-HTML")]
[InlineData("HTMLGuide5", "HTML-Guide-5")]
[InlineData("TheHTML5Guide", "The-HTML-5-Guide")]
[InlineData("TheGuideToHTML5", "The-Guide-To-HTML-5")]
[InlineData("TheUKAllStars", "The-UK-All-Stars")]
[InlineData("AllStarsUK", "All-Stars-UK")]
[InlineData("UKAllStars", "UK-All-Stars")]

+ 1 for explaining the regex and making it this readable. And I learned something new. There is a free-spacing mode and comments in .NET Regex. Thank you! — Felix Keil, Aug 28 '15 at 11:33

score 4 · Answer 7 · answered Oct 01 '08 at 02:44

For more variety, using plain old C# objects, the following produces the same output as @MizardX's excellent regular expression.

public string FromCamelCase(string camel)
{   // omitted checking camel for null
    StringBuilder sb = new StringBuilder();
    int upperCaseRun = 0;
    foreach (char c in camel)
    {   // append a space only if we're not at the start
        // and we're not already in an all caps string.
        if (char.IsUpper(c))
        {
            if (upperCaseRun == 0 && sb.Length != 0)
            {
                sb.Append(' ');
            }
            upperCaseRun++;
        }
        else if( char.IsLower(c) )
        {
            if (upperCaseRun > 1) //The first new word will also be capitalized.
            {
                sb.Insert(sb.Length - 1, ' ');
            }
            upperCaseRun = 0;
        }
        else
        {
            upperCaseRun = 0;
        }
        sb.Append(c);
    }

    return sb.ToString();
}

Wow, that's ugly. Now I remember why I so dearly love regex! +1 for effort, though. ;) — Mark Brackett, Oct 01 '08 at 03:22

score 3 · Answer 8 · answered Aug 03 '15 at 20:27

Below is a prototype that converts the following to Title Case:

snake_case
camelCase
PascalCase
sentence case
Title Case (keep current formatting)

Obviously you would only need the "ToTitleCase" method yourself.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var examples = new List<string> { 
            "THEQuickBrownFox",
            "theQUICKBrownFox",
            "TheQuickBrownFOX",
            "TheQuickBrownFox",
            "the_quick_brown_fox",
            "theFOX",
            "FOX",
            "QUICK"
        };

        foreach (var example in examples)
        {
            Console.WriteLine(ToTitleCase(example));
        }
    }

    private static string ToTitleCase(string example)
    {
        var fromSnakeCase = example.Replace("_", " ");
        var lowerToUpper = Regex.Replace(fromSnakeCase, @"(\p{Ll})(\p{Lu})", "$1 $2");
        var sentenceCase = Regex.Replace(lowerToUpper, @"(\p{Lu}+)(\p{Lu}\p{Ll})", "$1 $2");
        return new CultureInfo("en-US", false).TextInfo.ToTitleCase(sentenceCase);
    }
}

The console out would be as follows:

THE Quick Brown Fox
The QUICK Brown Fox
The Quick Brown FOX
The Quick Brown Fox
The Quick Brown Fox
The FOX
FOX
QUICK

Blog Post Referenced

score 2 · Answer 9 · answered Sep 30 '08 at 22:14

2

string s = "ThisIsMyCapsDelimitedString";
string t = Regex.Replace(s, "([A-Z])", " $1").Substring(1);

answered Sep 30 '08 at 22:14

Ferruccio

98,941
38
226
299

I knew there would be an easy RegEx way... I've got to start using it more. – Max Schmeling Sep 30 '08 at 22:17
1

Not a regex guru but what happens with "HeresAWTFString"? – Nick Sep 30 '08 at 22:24
1

You get "Heres A W T F String" but that's exactly what Matias Nino asked for in the question. – Max Schmeling Sep 30 '08 at 22:31
Yeah he needs to add that "multiple adjacent capitals are left alone". Which is pretty obviously required in many cases eg "PublisherID" here goes to "Publisher I D" which is awful – PandaWood Mar 29 '17 at 00:59

score 2 · Answer 10 · answered Oct 04 '17 at 14:28

Regex is about 10-12 times slower than a simple loop:

    public static string CamelCaseToSpaceSeparated(this string str)
    {
        if (string.IsNullOrEmpty(str))
        {
            return str;
        }

        var res = new StringBuilder();

        res.Append(str[0]);
        for (var i = 1; i < str.Length; i++)
        {
            if (char.IsUpper(str[i]))
            {
                res.Append(' ');
            }
            res.Append(str[i]);

        }
        return res.ToString();
    }

score 1 · Answer 11 · answered Sep 30 '08 at 22:18

1

Naive regex solution. Will not handle O'Conner, and adds a space at the start of the string as well.

s = "ThisIsMyCapsDelimitedString"
split = Regex.Replace(s, "[A-Z0-9]", " $&");

answered Sep 30 '08 at 22:18

Geoff

3,749
2
28
24

I modded you up, but people generally take a smackdown better if it doesn't start with "naive". – MusiGenesis Sep 30 '08 at 22:42
I don't think that was a smackdown. In this context, naive usually means obvious or simple (i.e. not necessarily the best solution). There is no intention of insult. – Ferruccio Sep 30 '08 at 23:58

Leniel Maccaferri · Answer 12 · 2022-12-17T14:11:47.833

For C# building on this awesome answer by @ZombieSheep but now using a compiled regex for better performance:

public static class StringExtensions
{
    private static readonly Regex _regex1 = new(@"(\P{Ll})(\P{Ll}\p{Ll})", RegexOptions.Compiled | RegexOptions.CultureInvariant);
    private static readonly Regex _regex2 = new(@"(\p{Ll})(\P{Ll})", RegexOptions.Compiled | RegexOptions.CultureInvariant);

    public static string SplitCamelCase(this string str)
    {
        return _regex2.Replace(_regex1.Replace(str, "$1 $2"), "$1 $2");
    }
}

Sample code:

private static void Main(string[] args)
{
    string str = "ThisIsAPropertyNAMEWithNumber10";

    Console.WriteLine(str.SplitCamelCase());
}

Result:

This Is A Property NAME With Number 10

A plus point of this one is that it also works for strings that contain digits/numbers.

I wouldn't be able to pull that answer together again now, but hearty upvote for the compiled regex. — ZombieSheep, Mar 30 '23 at 12:23

score 0 · Answer 13 · answered Sep 30 '08 at 22:12

There's probably a more elegant solution, but this is what I come up with off the top of my head:

string myString = "ThisIsMyCapsDelimitedString";

for (int i = 1; i < myString.Length; i++)
{
     if (myString[i].ToString().ToUpper() == myString[i].ToString())
     {
          myString = myString.Insert(i, " ");
          i++;
     }
}

score 0 · Answer 14 · answered Aug 25 '14 at 06:04

0

Try to use

"([A-Z]*[^A-Z]*)"

The result will fit for alphabet mix with numbers

Regex.Replace("AbcDefGH123Weh", "([A-Z]*[^A-Z]*)", "$1 ");
Abc Def GH123 Weh  

Regex.Replace("camelCase", "([A-Z]*[^A-Z]*)", "$1 ");
camel Case

answered Aug 25 '14 at 06:04

Erxin

1,786
4
19
33

score 0 · Answer 15 · edited May 23 '17 at 10:31

Implementing the psudo code from: https://stackoverflow.com/a/5796394/4279201

    private static StringBuilder camelCaseToRegular(string i_String)
    {
        StringBuilder output = new StringBuilder();
        int i = 0;
        foreach (char character in i_String)
        {
            if (character <= 'Z' && character >= 'A' && i > 0)
            {
                output.Append(" ");
            }
            output.Append(character);
            i++;
        }
        return output;
    }

score 0 · Answer 16 · answered May 15 '17 at 02:31

0

To match between non-uppercase and Uppercase Letter Unicode Category : (?<=\P{Lu})(?=\p{Lu})

Dim s = Regex.Replace("CorrectHorseBatteryStaple", "(?<=\P{Lu})(?=\p{Lu})", " ")

answered May 15 '17 at 02:31

Slai

22,144
5
45
53

Patrick from NDepend team · Answer 17 · 2018-01-10T15:10:53.693

Procedural and fast impl:

  /// <summary>
  /// Get the words in a code <paramref name="identifier"/>.
  /// </summary>
  /// <param name="identifier">The code <paramref name="identifier"/></param> to extract words from.
  public static string[] GetWords(this string identifier) {
     Contract.Ensures(Contract.Result<string[]>() != null, "returned array of string is not null but can be empty");
     if (identifier == null) { return new string[0]; }
     if (identifier.Length == 0) { return new string[0]; }

     const int MIN_WORD_LENGTH = 2;  //  Ignore one letter or one digit words

     var length = identifier.Length;
     var list = new List<string>(1 + length/2); // Set capacity, not possible more words since we discard one char words
     var sb = new StringBuilder();
     CharKind cKindCurrent = GetCharKind(identifier[0]); // length is not zero here
     CharKind cKindNext = length == 1 ? CharKind.End : GetCharKind(identifier[1]);

     for (var i = 0; i < length; i++) {
        var c = identifier[i];
        CharKind cKindNextNext = (i >= length - 2) ? CharKind.End : GetCharKind(identifier[i + 2]);

        // Process cKindCurrent
        switch (cKindCurrent) {
           case CharKind.Digit:
           case CharKind.LowerCaseLetter:
              sb.Append(c); // Append digit or lowerCaseLetter to sb
              if (cKindNext == CharKind.UpperCaseLetter) {
                 goto TURN_SB_INTO_WORD; // Finish word if next char is upper
              }
              goto CHAR_PROCESSED;
           case CharKind.Other:
              goto TURN_SB_INTO_WORD;
           default:  // charCurrent is never Start or End
              Debug.Assert(cKindCurrent == CharKind.UpperCaseLetter);
              break;
        }

        // Here cKindCurrent is UpperCaseLetter
        // Append UpperCaseLetter to sb anyway
        sb.Append(c); 

        switch (cKindNext) {
           default:
              goto CHAR_PROCESSED;

           case CharKind.UpperCaseLetter: 
              //  "SimpleHTTPServer"  when we are at 'P' we need to see that NextNext is 'e' to get the word!
              if (cKindNextNext == CharKind.LowerCaseLetter) {
                 goto TURN_SB_INTO_WORD;
              }
              goto CHAR_PROCESSED;

           case CharKind.End:
           case CharKind.Other:
              break; // goto TURN_SB_INTO_WORD;
        }

        //------------------------------------------------

     TURN_SB_INTO_WORD:
        string word = sb.ToString();
        sb.Length = 0;
        if (word.Length >= MIN_WORD_LENGTH) {  
           list.Add(word);
        }

     CHAR_PROCESSED:
        // Shift left for next iteration!
        cKindCurrent = cKindNext;
        cKindNext = cKindNextNext;
     }

     string lastWord = sb.ToString();
     if (lastWord.Length >= MIN_WORD_LENGTH) {
        list.Add(lastWord);
     }
     return list.ToArray();
  }
  private static CharKind GetCharKind(char c) {
     if (char.IsDigit(c)) { return CharKind.Digit; }
     if (char.IsLetter(c)) {
        if (char.IsUpper(c)) { return CharKind.UpperCaseLetter; }
        Debug.Assert(char.IsLower(c));
        return CharKind.LowerCaseLetter;
     }
     return CharKind.Other;
  }
  enum CharKind {
     End, // For end of string
     Digit,
     UpperCaseLetter,
     LowerCaseLetter,
     Other
  }

Tests:

  [TestCase((string)null, "")]
  [TestCase("", "")]

  // Ignore one letter or one digit words
  [TestCase("A", "")]
  [TestCase("4", "")]
  [TestCase("_", "")]
  [TestCase("Word_m_Field", "Word Field")]
  [TestCase("Word_4_Field", "Word Field")]

  [TestCase("a4", "a4")]
  [TestCase("ABC", "ABC")]
  [TestCase("abc", "abc")]
  [TestCase("AbCd", "Ab Cd")]
  [TestCase("AbcCde", "Abc Cde")]
  [TestCase("ABCCde", "ABC Cde")]

  [TestCase("Abc42Cde", "Abc42 Cde")]
  [TestCase("Abc42cde", "Abc42cde")]
  [TestCase("ABC42Cde", "ABC42 Cde")]
  [TestCase("42ABC", "42 ABC")]
  [TestCase("42abc", "42abc")]

  [TestCase("abc_cde", "abc cde")]
  [TestCase("Abc_Cde", "Abc Cde")]
  [TestCase("_Abc__Cde_", "Abc Cde")]
  [TestCase("ABC_CDE_FGH", "ABC CDE FGH")]
  [TestCase("ABC CDE FGH", "ABC CDE FGH")] // Should not happend (white char) anything that is not a letter/digit/'_' is considered as a separator
  [TestCase("ABC,CDE;FGH", "ABC CDE FGH")] // Should not happend (,;) anything that is not a letter/digit/'_' is considered as a separator
  [TestCase("abc<cde", "abc cde")]
  [TestCase("abc<>cde", "abc cde")]
  [TestCase("abc<D>cde", "abc cde")]  // Ignore one letter or one digit words
  [TestCase("abc<Da>cde", "abc Da cde")]
  [TestCase("abc<cde>", "abc cde")]

  [TestCase("SimpleHTTPServer", "Simple HTTP Server")]
  [TestCase("SimpleHTTPS2erver", "Simple HTTPS2erver")]
  [TestCase("camelCase", "camel Case")]
  [TestCase("m_Field", "Field")]
  [TestCase("mm_Field", "mm Field")]
  public void Test_GetWords(string identifier, string expectedWordsStr) {
     var expectedWords = expectedWordsStr.Split(' ');
     if (identifier == null || identifier.Length <= 1) {
        expectedWords = new string[0];
     }

     var words = identifier.GetWords();
     Assert.IsTrue(words.SequenceEqual(expectedWords));
  }

John Smith · Answer 18 · 2018-10-25T15:45:42.477

A simple solution, which should be order(s) of magnitude faster than a regex solution (based on the tests I ran against the top solutions in this thread), especially as the size of the input string grows:

string s1 = "ThisIsATestStringAbcDefGhiJklMnoPqrStuVwxYz";
string s2;
StringBuilder sb = new StringBuilder();

foreach (char c in s1)
    sb.Append(char.IsUpper(c)
        ? " " + c.ToString()
        : c.ToString());

s2 = sb.ToString();

LionGod8 · Answer 19 · 2023-02-23T03:09:33.743

Regex.Replace(str, @"(\p{Ll}(?=[\p{Lu}0-9])|\p{Lu}(?=\p{Lu}\p{Ll}|[0-9])|[0-9](?=\p{L}))", "$1 ")

It deals with all Unicode characters, plus it works fine if your string is a regular sentence that contains a camel case expression (and you want to keep the sentence intact but to break the camel case into words, without duplicating spaces etc).

I took Markus Jarderot's answer which is excellent (so credits to him) and replaced [A-Z] with \p{Lu} and [a-z] with \p{Ll} and modified the last part to deal with numbers.

If you want numbers to trail after acronyms (e.g. HTML5Guide ⮕ HTML5 Guide):

Regex.Replace(str, @"(\p{Ll}(?=[\p{Lu}0-9])|\p{Lu}(?=\p{Lu}\p{Ll})|[0-9](?=\p{L}))", " $1")

Another approach

Just another approach to solve the problem:

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=\p{L})[0-9]|(?<=[0-9])\p{Ll})", " $1")

More Options

If you want numbers to trail after acronyms (e.g. HTML5Guide ⮕ HTML5 Guide):

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=\p{Ll})[0-9]|(?<=[0-9])\p{Ll})", " $1")

If you want numbers to trail after any word (e.g. Html5Guide ⮕ Html5 Guide):

Regex.Replace(str, @"((?<=[\p{Ll}0-9])\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll})|(?<=[0-9])\p{Ll})", " $1")

If you don't want to deal with numbers and you're sure to not have them in the string:

Regex.Replace(str, @"((?<=\p{Ll})\p{Lu}|(?<=\p{Lu})\p{Lu}(?=\p{Ll}))", " $1")

For a simpler version (ignoring special Unicode characters like é as in fiancé),
pick any of the above regexes and simply
replace \p{Lu} with [A-Z], \p{Ll} with [a-z] and \p{L} with [A-Za-z].

.NET - How can you split a "caps" delimited string into an array?

19 Answers19

Another approach

More Options

Linked

Related