I would agree that the problem is most probably with encoding.
For instance, such problem seems to appear on nasa.gov website only on topic pages related to American-Russian space collaboration (which suggests that it is due to cyrillic characters in webpages content).
I solved the problem by using deprecated Relenium where RSelenium fails. To make Relenium run smoothly on Ubuntu 16.04 I had to install Firefox 25.0 and configure it in a way to prevent any updates. The other issue during set up was to properly install rJava, which can fail due to lack of environment variables with proper paths to Java libraries.
System configuration is as follows:
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
relenium_0.3.0; seleniumJars_2.41.0; rJava_0.9-8; RSelenium_1.3.5
Below is an example of a page that can be scraped with Relenium but not with release version of RSelenium:
link = "http://www.nasa.gov/mission_pages/station/expeditions/expedition14/index.html"
RSelenium solution fails (with Firefox of version either 34.0.5, or 25.0, no matter):
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(link)
doc = unlist(remDr$getPageSource())
Result: "Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input"
While Relenium is ok with it:
relenium_browser <- firefoxClass$new()
relenium_browser$get(link)
doc = unlist(relenium_browser$getPageSource())
doc = read_html(doc)