1

I am scraping a newspaper website (http://politiken.dk) and I could get all the titles from the news I need. But I can't get the headlines + full text.

When I try without login, the code just get the first headline of the day I am scraping (not even the one as my first in the RData list).

I believe I need to log in to get right?

So I got a user and a password, but I cannot make any code work.

And I need to get the headlines from the articles in my RData, in the section URL. So the specific urls for all the articles I need are already in this code (under).

I saw this code to create a login in this website but I cannot apply to my case

library(httr)
library(XML)

handle <- handle("http://subscribers.footballguys.com") # I DONT KNOW WHAT TO PUT HERE
path   <- "amember/login.php" ##I DONT KNOW WHAT TO PUT HERE

# fields found in the login form.
login <- list(
  amember_login = "username"
 ,amember_pass  = "password"
 ,amember_redirect_url = 
   "http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2"
)

response <- POST(handle = handle, path = path, body = login)

This is my code to get the headlines:

headlines <- rep("",nrow(politiken.unique))
for(i in 1:nrow(politiken.unique)){
  try({
    text <- read_html(as.character(politiken.unique$urls[i])) %>%
      html_nodes(".summary__p") %>% 
      html_text(trim = T) 
    headlines[i] = paste(text, collapse = " ")
  })
}

I tried this suggestion: Scrape password-protected website in R

But it did not work, or I don't know how to do.

Thanks in advance!

Maria
  • 13
  • 8

0 Answers0