4

I'm using Twill to retrieve pages that contain wanted .txt data on them so I can store them as an Excel file. The data is password protected so I'm logging in from the /user/login page.

My code runs into the problem where it tries to access the text page from the login screen and hits a brick wall of HTML rather than the .txt itself.

When I run the login:

path = "https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/"
end = "td.txt"

go("http://www.naturalgasintel.com/user/login")
showforms()
fv("2", "user[email]", user_email)
fv("2", "user[password]", user_password)
fv("2", "commit", "Login")

datafilelocation = path + year + "/" + month + "/" + date + end
go(datafilelocation)

When my code gets to go(datafilelocation) I get this:

==> at https://www.naturalgasintel.com/user/login?referer=%2Fext%2Fresources%2FData-Feed%2FDaily-GPI%2F2018%2F12%2F20181221td.txt
Out[18]: u'https://www.naturalgasintel.com/user/login?referer=%2Fext%2Fresources%2FData-Feed%2FDaily-GPI%2F2018%2F12%2F20181221td.txt'

So it points to the referer rather than the actual text when I really want to get to the page:

https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/12/20181221td.txt

The reason I used fv("2", "commit", "Login") instead of submit() is that when I get to the page it gives me this:

showforms()

Form name=quick-search (#1)
## ## __Name__________________ __Type___ __ID________ __Value__________________
1     q                        text      q            Search 


Form #2
## ## __Name__________________ __Type___ __ID________ __Value__________________
1     utf8                     hidden    (None)       ✓ 
2     authenticity_token       hidden    (None)       pnFnPGhMomX2Lyh7/U8iGOZKsiQnyicj7BWT ... 
3     referer                  hidden    (None)       https://www.naturalgasintel.com/ext/ ... 
4     popup                    hidden    (None)       false 
5     user[email]              text      user_email    
6     user[password]           password  user_pas ... 
7     user[remember_me]        hidden    (None)       0 
8     user[remember_me]        checkbox  user_rem ... None 
9     commit                   submit    (None)       Login 

Then it tells me after I submit():

Note: submit is using submit button: name="commit", value="Login"

What is the best solution to solve this issue?

HelloToEarth
  • 2,027
  • 3
  • 22
  • 48
  • The issue is definitely that you're not being logged in properly, so when you make the request for the data it's redirecting you back to `/user/login`. It passes the url to the data in the `referrer` parameter so that it can redirect you back to it once you've logged in. – cody Dec 21 '18 at 21:10
  • I tried to correct this by logging in from the main page but if you try only form #1 exists with no pointer to a user email or password. It only gives the quick-search page so I have to log in from the user/login page. Is there any way to log in, save with cookies and go back to the main page to access the file? – HelloToEarth Dec 21 '18 at 21:13

1 Answers1

1

If you'd be fine using Mechanize instead of Twill, give the following a shot:

import mechanize

username = ""
password = ""
login_post_url = "http://www.naturalgasintel.com/user/login"
internal_url = "https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/12/20181221td.txt"

browser = mechanize.Browser()
browser.open(login_post_url)
browser.select_form(nr = 1)
browser.form['user[email]'] = username
browser.form['user[password]'] = password
browser.submit()

response = browser.open(internal_url)
print response.read()