Scraping login protected website with a challenge form?

Question

I'm trying to do some web scraping from steamspy.com, specifically the total playtime hours for a certain game. That info is behind the login wall for the site, so I've been trying to figure out how to get R past it for html mining.

I tried this method for passing login credentials via POST() but it doesn't seem to work. I noticed that the login handler for that example used POST, whereas looking at the source code for steamspy it seems to use a challenge form and I wasn't sure how to proceed with R.

My attempt thus far looks like this:

handle <- handle("http://steamspy.com")
path <- "/login/"

login <- list(

jschl_vc = "bc4e...",
pass = "148..."
)

response <- POST(handle = handle, path = path, body = login)

I found the values for the jschl_vc and pass from inspecting the source code after I logged in. The code above doesn't work and gives me:

Error in curl::curl_fetch_memory(url, handle = handle) : Failure when receiving data from the peer

probably since I'm tryign to use POST to a challenge form. Is there way that I'm missing to proceed?

Why are you just not using the API they provide? http://steamspy.com/api.php — hrbrmstr, Feb 15 '17 at 19:41
In this specific case, the total playtime hours are not a pullable result from the API. In fact none of the graph data is. The graph data is stored on the html of the page, however, with the total playtime hours histogram stored on the html of the page post-login. — AI52487963, Feb 15 '17 at 19:53

Scraping login protected website with a challenge form?

0 Answers0