I'd like to match CJK characters. But the following regex [[:alpha:]]\+ does not work. Does anybody know to match CJK characters?
$ echo '程 a b' | sed -e 's/\([[:alpha:]]\+\)/x\1/g'
程 xa xb
The desired the output is x程 a b.
I'd like to match CJK characters. But the following regex [[:alpha:]]\+ does not work. Does anybody know to match CJK characters?
$ echo '程 a b' | sed -e 's/\([[:alpha:]]\+\)/x\1/g'
程 xa xb
The desired the output is x程 a b.
With Perl, your solution will look like
perl -CSD -Mutf8 -pe 's/\p{Han}+/x$&/g' filename
Or, with older Perl versions before 5.20, use a capturing group:
perl -CSD -Mutf8 -pe 's/(\p{Han}+)/x$1/g' filename
To modify file contents inline add -i option:
perl -i -CSD -Mutf8 -pe 's/(\p{Han}+)/x$1/g' filename
NOTES
\p{Han} matches a single Chinese character, \{Han}+ matches chunks of 1 or more Chinese characters$1 is the backreference to the value captured with (\p{Han}+), $& replaces with the whole match value-Mutf8 lets Perl recognize the UTF8-encoded characters used directly in your Perl code-CSD (equivalent to -CIOED) allows input decoding and output re-encoding (it will work for UTF8 encoding).