0

I'm trying to do some web scraping from steamspy.com, specifically the total playtime hours for a certain game. That info is behind the login wall for the site, so I've been trying to figure out how to get R past it for html mining.

I tried this method for passing login credentials via POST() but it doesn't seem to work. I noticed that the login handler for that example used POST, whereas looking at the source code for steamspy it seems to use a challenge form and I wasn't sure how to proceed with R.

My attempt thus far looks like this:

handle <- handle("http://steamspy.com")
path <- "/login/"

login <- list(

jschl_vc = "bc4e...",
pass = "148..."
)

response <- POST(handle = handle, path = path, body = login)

I found the values for the jschl_vc and pass from inspecting the source code after I logged in. The code above doesn't work and gives me:

Error in curl::curl_fetch_memory(url, handle = handle) : Failure when receiving data from the peer

probably since I'm tryign to use POST to a challenge form. Is there way that I'm missing to proceed?

Maxim
  • 52,561
  • 27
  • 155
  • 209
AI52487963
  • 1,253
  • 2
  • 17
  • 36
  • Why are you just not using the API they provide? http://steamspy.com/api.php – hrbrmstr Feb 15 '17 at 19:41
  • In this specific case, the total playtime hours are not a pullable result from the API. In fact none of the graph data is. The graph data is stored on the html of the page, however, with the total playtime hours histogram stored on the html of the page post-login. – AI52487963 Feb 15 '17 at 19:53

0 Answers0