0

Hey I'm trying to login to a website and get the html of the webpage after the login. And can't figure out how to do it with python. Using python 2.7. Need to fill out the html forms on this website:

'user'= 'magaleast' and 'password' = '1181' (real login details that are useless to me). Then the website redirects the user to an authentication page and when its done it goes to the page i need.

Any ideas?

EDIT: trying this code:

from mechanize import Browser
import cookielib
br = Browser()
br.open("http://www.shiftorganizer.com/")


cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)

# You need to spot the name of the form in source code
br.select_form(name = "user")
# Spot the name of the inputs of the form that you want to fill, 
# say "username" and "password"
br.form["user"] = "magaleast"
br.form["password"] = "1181"

response = br.submit()
print response.read()

but i get:

     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>ShiftOrganizer סידור עבודה בפחות משניה</title>

    <meta http-equiv="content-type" content="text/html; charset=utf-8" />



<script type="text/javascript">

var emptyCompany=1

function subIfNewApp()

{

    if (emptyCompany){

        document.authenticationForm.action = document.getElementById('userName').value + "/authentication.asp"

    } else {

        document.authenticationForm.action = document.getElementById('Company').value + "/authentication.asp"

    }

    document.authenticationForm.submit()

}

</script>

</head>

    <body onload="subIfNewApp()">

    <form name="authenticationForm" method="post" action="">

        <input type="hidden" name="userName" id="userName" value="magaleast" />

        <input type="hidden" name="password" id="password" value="1181" />

        <input type="hidden" name="Company" id="Company" value="שם חברה" />

        </form>

    </body>

</html>

is js the problem? because it stops in the authentication part again..?

Nick Kobishev
  • 157
  • 4
  • 13
  • You'll need to show the code you have. – Daniel Roseman May 19 '14 at 10:26
  • Possible duplicate http://stackoverflow.com/questions/4489550/how-to-get-an-html-file-using-python?rq=1 – ρss May 19 '14 at 11:43
  • have a look at the Requests library and in particular Session Objects – galinden May 19 '14 at 11:48
  • pss, Not a duplicate because i need to log in and thats the main problem. @Daniel Roseman i dont have any code because i tried using twill but i didnt manage to ge through the authentication part.. I gave the website and the login info for the chance someone can tell me how to use it. – Nick Kobishev May 19 '14 at 12:33
  • Login will not be a difficult thing. Did you try Requests library or urllib2 module? You have to demonstrate that you at least tried something. – ρss May 19 '14 at 12:35
  • @pss im writing from my phone so i dont have a python twill code sample. But i did manage to log in. And got stuck on the authentication redirection part and cant figure how to continue.. So i thought i might get some other way to go if i ask here.. – Nick Kobishev May 19 '14 at 12:41

1 Answers1

0

It seems that the website requires some JS indeed so the code below won't be enough. In that particular case, by looking at the source code, it seems that at the end this url is used :

http://shifto.shiftorganizer.com/magaleast/welcome.asp?password=1181 which seems to contain similar information that the page after login (altough I can't read Hebrew, I may be totally wrong...). If so, you could simply do:

import urllib
url = 'http://shifto.shiftorganizer.com/*username*/welcome.asp?password=*password*'
print urllib.urlopen(url).read()

For information, code to login to a form which does not require Javascript.

I would use the mechanize library (also Requests will work), doing something like

from mechanize import Browser

br = Browser()
br.set_cookiejar(cookielib.LWPCookieJar())

# Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)

br.open("your url")

# You need to spot the name of the form in source code
br.select_form(name="form_name")  

# Spot the name of the inputs of the form that you want to fill, 
# say "username" and "password"
br.form["username"] = "magaleast"
br.form["password"] = "1181"

response = br.submit()
print response.read()
Seb D.
  • 5,046
  • 1
  • 28
  • 36
  • Will it work with the redirection part? Because twill is based on mechanize and it atops there.. – Nick Kobishev May 19 '14 at 13:10
  • I added some code to handle redirections and cookies (if any). It should work but without testing and knowing what the site will do, it's hard to tell. It's perfectly possible that the site requires some javascript, that mechanize cannot deal with. – Seb D. May 19 '14 at 13:27
  • Thanks. Ill try that asap and edit the question if there is still something i do not understand. Also, if theres need for js what module should i use? – Nick Kobishev May 19 '14 at 14:00
  • That could be a much more complicated problem. [Selenium](https://pypi.python.org/pypi/selenium) has bindings in Python and can trigger JS, [Ghost](http://jeanphix.me/Ghost.py/) could also work. You can have a look [here](http://stackoverflow.com/questions/17608572/web-scraping-dynamic-content-with-python) or [here](http://stackoverflow.com/questions/17540971/how-to-use-selenium-with-python). Altough the best in those cases is usually have a look at the js and try to find what happens to directly emulate it if possible (for instance, by directly making the adequate POST request). – Seb D. May 19 '14 at 15:22
  • Edited original question. fixed some of the code you provided but still stuck in the authentication. – Nick Kobishev May 20 '14 at 12:11
  • it works to some extent.. but i dont get the correct end result, some toolbars are missing in the website itself. is there any way to alter the 'action' part in `(
    )` ?
    – Nick Kobishev May 20 '14 at 15:11
  • No not really. At this point, either you dig more the source code (both html and js) to try to find url that you can use to post your credentials (for instance, I also spotted [this one](http://shifto.shiftorganizer.com/magaleast/welcome.asp?password=1181&userId=295&userName=%F0%E9%F7%E9%E8%E4%20%F7%E5%E1%E9%F9%E1) but I think it will yield the same) or your use another library to run Javascript. – Seb D. May 20 '14 at 15:50