Here was a question about the CamelCase regex. With the combination of tchrist post i'm wondering what is the correct utf-8 CamelCase.
Starting with (brian d foy's) regex:
/
    \b          # start at word boundary
    [A-Z]       # start with upper
    [a-zA-Z]*   # followed by any alpha
    (?:  # non-capturing grouping for alternation precedence
       [a-z][a-zA-Z]*[A-Z]   # next bit is lower, any zero or more, ending with upper
          |                     # or 
       [A-Z][a-zA-Z]*[a-z]   # next bit is upper, any zero or more, ending with lower
    )
    [a-zA-Z]*   # anything that's left
    \b          # end at word 
/x
and modifying to:
/
    \b          # start at word boundary
    \p{Uppercase_Letter}     # start with upper
    \p{Alphabetic}*          # followed by any alpha
    (?:  # non-capturing grouping for alternation precedence
       \p{Lowercase_Letter}[a-zA-Z]*\p{Uppercase_Letter}   ### next bit is lower, any zero or more, ending with upper
          |                  # or 
       \p{Uppercase_Letter}[a-zA-Z]*\p{Lowercase_Letter}   ### next bit is upper, any zero or more, ending with lower
    )
    \p{Alphabetic}*          # anything that's left
    \b          # end at word 
/x
Have a problem with lines marked '###'.
In addition, how to modify the regex when assuming than numbers and the underscore are equivalent to lowercase letters, so W2X3 is an valid CamelCase word.
Updated: (ysth comment)
for the next,
- any: mean "uppercase or lowercase or number or underscore"
The regex should match CamelWord, CaW
- start with uppercase letter
- optional any
- lowercase letter or number or underscore
- optional any
- upper case letter
- optional any
Please, do not mark as duplicate, because it is not. The original question (and answers too) thought only ascii.
 
     
    