I am using the stringi package for a while now and everything works fine.
I recently wanted to put some regex inside a function and store that function in a separate file. The code works just fine if the function is loaded from the script but when it is sourced I do not get the expected result.
Here is the code to reproduce the issue :
clean <- function(text){
stri_replace_all_regex(str = text,
pattern = "(?i)[^a-zàâçéèêëîïôûùüÿñæœ0-9,\\.\\?!']",
replacement = " ")
}
text <- "A sample text with some french accent é, è, â, û and some special characters |, [, ( that needs to be cleaned."
clean(text) # OK
[1] "A sample text with some french accent é, è, â, û and some special characters , , that needs to be cleaned."
source(clean.r)
clean(text) # KO
[1] "A sample text with some french accent , , , and some special characters , , that needs to be cleaned."
I want to remove everything that is not a letter, an accented letters and punctuation charcater ?, !, ,, and ..
The code works just fine if the function is loaded inside the script directly. If it is sourced then it gives a different result.
I also tried using stringr and I have the same problem. My files are saved in UTF-8 encoding.
I do not understand why this is happening, any help is greatly appreciated.
Thank you.
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringi_1.1.5 data.table_1.10.4
loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 yaml_2.1.14