I am trying to scrape a list from the following URL: https://www.oncomap.de/centers?selectedOrgan=Darm&selectedCounty=Deutschland
Using Chrome's Developer Tools, I find that my content of interest is inside body > app-root > app-top > div ... . I tried finding this content using Python's BeautifulSoup4 package. Unfortunately, it is not possible to dive into the structure beyond the app-root tag. I am using the following code:
import requests
from bs4 import BeautifulSoup
import pprint
headers = {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Max-Age': '3600',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
}
url = 'https://www.oncomap.de/centers?selectedOrgan=Darm&selectedCounty=Deutschland'
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, "html-parser")
mat_row = soup.select('body > app-root')
pp = pprint.PrettyPrinter()
for child in mat_row[0].descendants:
pp.pprint(child)
There is not output from this code - no descendant (also tried children) is printed. I think I am dealing with a ReactJS div here. Would anyone have any hints how to process such content? Specifically, I am keen to scrape the main list on the page into a Python-readable table. THanks for your help!