Logging in to website to access data using Python

Question

I have a subscription to the site https://www.naturalgasintel.com/ for daily feeds of data that show up on their site directly as .txt files; their user login page being https://www.naturalgasintel.com/user/login/

For example a file for today's feed is given by the link https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2019/01/20190104td.txt and shows up on the site like the picture below:

What I'd like to do is to log in using my user_email and user_password and scrape this data in the form of an Excel file.

When I use Twill to try and 'point' me to the data by first logging me into the site I use this code:

from email.mime.text import MIMEText
from subprocess import Popen, PIPE
import twill
from twill.commands import *

year= NOW[0:4]
month=NOW[5:7]
day=NOW[8:10]
date=(year+month+day)

path = "https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/"
end = "td.txt"

go("http://www.naturalgasintel.com/user/login")
fv("2", "user[email]", user_email)
fv("2", "user[password]", user_password)
fv("2", "commit", "Login")

datafilelocation = path + year + "/" + month + "/" + date + end
go(datafilelocation)

However, logging in from the user login page sends me to this referrer link when I go to the data's location.

https://www.naturalgasintel.com/user/login?referer=%2Fext%2Fresources%2FData-Feed%2FDaily-GPI%2F2019%2F01%2F20190104td.txt

Rather than:

https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2019/01/20190104td.txt

I've tried using modules like requests as well to log in from the site and then access this data but whatever method I use sends me to the HTML source rather than the .txt data location itself.

I've posted my complete walk-through with the Python 2.7 module Twill which I attached a bounty to here:

Using Twill to grab .txt from login page Python

What would the best solution to being able to access these password protected files be?

Not too sure why I'm getting downvoted. Could I please get some comments at least as to what needs changing? I tried to edit my problem so that it's more clear as to what solution I'm looking for is (along with my attempt at a solution.) — HelloToEarth, Jan 04 '19 at 16:19
Not sure either, although the title seems suspicious at first. Maybe that's it? I think the question is clear and well redacted. — mariogarcc, Jan 04 '19 at 16:23
Have you looked at `selenium` btw? It's not gonna be the cleanest way of doing it (it'll actually load a browser while it's running), but it wouldn't be too hard to log in and get the page contents — Peter, Jan 04 '19 at 16:24
I have heard of it, @Peter. I think because I already have a script that runs well that I'd like to try and keep it. The only issue with the script is that the page has changed its contents and the log in mechanisms no longer work properly for reasons I don't understand. — HelloToEarth, Jan 04 '19 at 16:27

B Lean · Answer 1 · 2019-01-07T22:34:27.350

If you have a compatible version of FireFox for this, then get the plugin javascript 0.0.1 by Chee and add the following to run on the page:

document.getElementById('user_email').value = "E-What";
document.getElementById('user_password').value = " ABC Password ";

Change the email and password as you like. It will load the page, then after that it will put in your username and password.

There are other ways to do this all by yourself with your own stand-alone process. You do not have to download other people's programs and try to learn them (beyond this little thing) if you change it this way.

I would have up voted this question.

Logging in to website to access data using Python

1 Answers1