Let me explain for you the scenario:
- Website is generating a
session id behind that parameter X-Session-Id which is dynamically generated once you visit the main page page index. So i called it via GET request and I've picked it up from the headers response.
I've figured out an POST request which is automatically generated before you hit your desired url which is actually using the session id which we collected before. here is it https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/clear/sessions/{session id}
Now we can call your target which is https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Fatalities?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n.
Now I noticed another XHR request to the back-end API. But before we do the call, We will parse the HTML content for picking up the time object which is responsible on generating the data freshly from the API so we will get an instant data (consider it like a live chat actually). in our case it's behind lastUpdatedAt inside the HTML
I noticed as well that we will need to pickup the recent X-Session-Id generated from our previous POST request.
Now we will make the call using our picked up session to https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/bootstrapSession/sessions/{session}
Now we have received the full response. you can parse it or do whatever you want.
import requests
import re
data = {
'worksheetPortSize': '{"w":1536,"h":1250}',
'dashboardPortSize': '{"w":1536,"h":1250}',
'clientDimension': '{"w":1536,"h":349}',
'renderMapsClientSide': 'true',
'isBrowserRendering': 'true',
'browserRenderingThreshold': '100',
'formatDataValueLocally': 'false',
'clientNum': '',
'navType': 'Reload',
'navSrc': 'Top',
'devicePixelRatio': '2.5',
'clientRenderPixelLimit': '25000000',
'allowAutogenWorksheetPhoneLayouts': 'true',
'sheet_id': 'NYSDOH%20COVID-19%20Tracker%20-%20Fatalities',
'showParams': '{"checkpoint":false,"refresh":false,"refreshUnmodified":false}',
'filterTileSize': '200',
'locale': 'en_US',
'language': 'en',
'verboseMode': 'false',
':session_feature_flags': '{}',
'keychain_version': '1'
}
def main(url):
with requests.Session() as req:
r = req.post(url)
sid = r.headers.get("X-Session-Id")
r = req.post(
f"https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/clear/sessions/{sid}")
r = req.get(
"https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Fatalities?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n")
match = re.search(r"lastUpdatedAt.+?(\d+),", r.text).group(1)
time = '{"featureFlags":"{\"MetricsAuthoringBeta\":false}","isAuthoring":false,"isOfflineMode":false,"lastUpdatedAt":xxx,"workbookId":9}'.replace(
'xxx', f"{match}")
data['stickySessionKey'] = time
nid = r.headers.get("X-Session-Id")
r = req.post(
f"https://covid19tracker.health.ny.gov/vizql/w/NYS-COVID19-Tracker/v/NYSDOHCOVID-19Tracker-Fatalities/bootstrapSession/sessions/{nid}", data=data)
print(r.text)
main("https://covid19tracker.health.ny.gov")