I have this regex
\b(t[úu]s*)\b
And i have this words:
tu (works) tú (doesn't work) tus (works) tús (works)
Why can't I match tú?
I have this regex
\b(t[úu]s*)\b
And i have this words:
tu (works) tú (doesn't work) tus (works) tús (works)
Why can't I match tú?
If the regex doesn't match, the two characters differ.
"u with acute" can be expressed as the single Character ú (U+00FA) or by combining u (U+0075) with the combining acute accent character (U+0301) which gives a similar looking ú.
You have to either convert your input string or include both variants in you regular expression, see http://www.regular-expressions.info/unicode.html for details.
Why doesn't that expression match
tú?
That expression doesn't match tú because \b doesn't seem to recognize ú as a word character, and thus fails when used between non-word characters.
You could use something like this instead:
/(?<!\p{L})(t[úu]s*)(?!\p{L})/u
\p{L} matches a unicode letter.