I am implementing an LDA topic model using tm and topicmodels packages. Some of the documents contain odd characters that are not removed automatically (e.g. docs <- tm_map(docs, removePunctuation does not remove ’. When I read the .txt files into R, the Euro sign €, for example, shows up as €. There are other odd characters throughout the corpus that show up frequently and need to be removed manually. Thus, I use the following lines to do it:
docs <- tm_map(docs, toSpace, "’")  
docs <- tm_map(docs, toSpace, "‐")  
docs <- tm_map(docs, toSpace, "–")  
docs <- tm_map(docs, toSpace, "€")  
docs <- tm_map(docs, toSpace, "’")
My problem is that once I close the R-script and reopen it, these odd symbols change. Instead of ’ the sript shows ', instead of ’ it shows â???T. As a result, the symbols are not removed from the text when I close and reopen the R-script and I have to manually change these symbols to what I need every-time the script is reopened. I copied these lines into a Word document and every time I reopen R-script I paste the lines from Word document into R-script. This is very inefficient. So I wonder is there a way for me to save the R-script so that these  odd symbols are not lost after reopening? Or maybe I should do something with my original .txt files? Thank you!
