I am scraping a newspaper website (http://politiken.dk) and I could get all the titles from the news I need. But I can't get the headlines + full text.
When I try without login, the code just get the first headline of the day I am scraping (not even the one as my first in the RData list).
I believe I need to log in to get right?
So I got a user and a password, but I cannot make any code work.
And I need to get the headlines from the articles in my RData, in the section URL. So the specific urls for all the articles I need are already in this code (under).
I saw this code to create a login in this website but I cannot apply to my case
library(httr)
library(XML)
handle <- handle("http://subscribers.footballguys.com") # I DONT KNOW WHAT TO PUT HERE
path <- "amember/login.php" ##I DONT KNOW WHAT TO PUT HERE
# fields found in the login form.
login <- list(
amember_login = "username"
,amember_pass = "password"
,amember_redirect_url =
"http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2"
)
response <- POST(handle = handle, path = path, body = login)
This is my code to get the headlines:
headlines <- rep("",nrow(politiken.unique))
for(i in 1:nrow(politiken.unique)){
try({
text <- read_html(as.character(politiken.unique$urls[i])) %>%
html_nodes(".summary__p") %>%
html_text(trim = T)
headlines[i] = paste(text, collapse = " ")
})
}
I tried this suggestion: Scrape password-protected website in R
But it did not work, or I don't know how to do.
Thanks in advance!