Right now my regex is something like this:
[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - ' , to be included.
Right now my regex is something like this:
[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - ' , to be included.
 
    
    Accented Characters: DIY Character Range Subtraction
If your regex engine allows it (and many will), this will work:
(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$
Please see the demo (you can add characters to test).
Explanation
(?i) sets case-insensitive mode^ anchor asserts that we are at the beginning of the string(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...(?![×Þß÷þø]) asserts that the char is not one of those in the brackets[-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract+ matches that one or more times$ anchor asserts that we are at the end of the stringReference
 
    
    You put in your expression:
\p{L}\p{M}
This in Unicode will match:
 
    
    A version without the exclusion rules:
^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$
Explanation
^ anchor asserts that we are at the beginning of the string [...] allows dash, apostrophe,
digits, letters, and chars in a wide accented range,+ matches that one or more times$ anchor asserts that we are at the end of the stringReference
 
    
    Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):
[-'[:alpha:]0-9] or [-'[:alnum:]]
The [:alpha:] character class matches whatever is considered "alphabetic characters" in your locale.
 
    
    @NightCoder's answer works perfectly in PHP:
    \p{L}\p{M}
and with no brittle whitelists. Note that to get it working in javascript you need to add the unicode u flag. Useful to have a working example in javascript...
const text = `Crêpes are øh-so déclassée`
[ ...text.matchAll(  /[-'’\p{L}\p{M}\p{N}]+/giu  ) ]
will return something like...
[
    {
        "0": "Crêpes",
        "index": 0
    },
    {
        "0": "are",
        "index": 7
    },
    {
        "0": "øh-so",
        "index": 11
    },
    {
        "0": "déclassée",
        "index": 17
    }
]
Here it is in a playground... https://regex101.com/r/ifgH4H/1/
And also some detail on those regex unicode categories... https://javascript.info/regexp-unicode
