how can i replace all non word characters (utf-8) in a string ?
for ASCII:
$url = preg_replace("/\W+/", " ", $url);
is there any equivalent for UTF-8 ?
Use unicode properties:
$url = preg_replace("/[^\p{L}\p{N}_]+/u", " ", $url);
\p{L} stands for any letter
\p{N} stands for any number.
You can use the Xwd character class that contains letters, digits and underscore:
$url = preg_replace('~\P{Xwd}+~u', ' ', $url);
If you don't want the underscore, you can use Xan
\p{Xwd} (Perl word character) is a predefined character class and \P{Xwd} is the negation of this class.
The u modifier means that the string must be treated as an unicode string.
equivalence:
\p{Xan} <=> [\p{L}\p{N}]
\p{Xwd} <=> [\p{Xan}_]