I am preparing a dataset that contains CJK characters with R and mostly through Tidyverse. During the process, I found that some character elements has \037 at the very end.
# A tibble: 99 × 2
Prefecture n
<chr> <int>
1 \037 1
2 北海道\037 1
3 北海道 13
4 北海道 4
... ... ...
I have tried to remove them with the line below:
library(stringr)
out.file %>% mutate(
Prefecture = str_replace_all(out.file$Prefecture, "\\\\037", "")
)
The str_replace_all does remove all the \037s when being tested on a string. When applying mutate on an entire column, however, the lines above still gives the same results in the first code chunk in this post.
What would be the most efficient way to remove them from strings?
Update with solution
require(stringi)
out.file %>%
mutate(Prefecture = stri_escape_unicode(Prefecture),
Prefecture = str_replace_all(Prefecture, "\037", ""),
Prefecture = stri_unescape_unicode(Prefecture))
This way I am able to resolve the issue successfully.