I am trying to scrape the data from a webpage and I have trouble manipulating the strings. If you visit the page, you'll realize that this is a website written in French. I am trying to get the data in tabular format at the bottom of the page. In French, thousand separators are either . or spaces, which are used on the webpage.
Here is my code to scrap the values in the second column:
library(rvest)
link <- read_html("http://perspective.usherbrooke.ca/bilan/servlet/BMTendanceStatPays?langue=fr&codePays=NOR&codeTheme=1&codeStat=SP.POP.TOTL")
link %>%
html_nodes(".tableauBarreDroite") %>%
html_text() -> pop
head(pop)
[1] "3Â 581Â 239" "3Â 609Â 800" "3Â 638Â 918" "3Â 666Â 537" "3Â 694Â 339" "3Â 723Â 168"
The values in the pop vector contain the expected spaces with the unexpected Â. I tried the following to remove the spaces:
new.pop <- gsub(pattern = " ", replacement = "", x = pop)
head(new.pop)
[1] "3Â 581Â 239" "3Â 609Â 800" "3Â 638Â 918" "3Â 666Â 537" "3Â 694Â 339" "3Â 723Â 168"
The spaces are still present in the new.pop variable. I also tried to remove tabs instead:
new.pop <- gsub(pattern = "\n", replacement = "", x = pop)
head(new.pop)
[1] "3Â 581Â 239" "3Â 609Â 800" "3Â 638Â 918" "3Â 666Â 537" "3Â 694Â 339" "3Â 723Â 168"
As you can see, the spaces are not going away. Do you have any idea what I should do to transform pop vector into a numeric vector after removing the unwanted characters?