0

I'm trying to write a script in Python to grab all of the rosters in my fantasy football league, but you have to login to ESPN first. The code I have is below. It looks like it's working when it runs -- i.e., I see the login page come up, I see it login, and the page closes. Then when I print the soup I don't see any team rosters. I saved the soup output as an html file to see what it is and it's just the page redirecting me to login again. Do I load the page through BS4 before I try to login?

import time
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import urllib.request as urllib2
from bs4 import BeautifulSoup

driver = webdriver.Chrome()

driver.get("http://games.espn.go.com/ffl/signin")
#implement wait it is mandatory in this case
WebDriverWait(driver,1000).until(EC.presence_of_all_elements_located((By.XPATH,"(//iframe)")))
frms = driver.find_elements_by_xpath("(//iframe)")

driver.switch_to_frame(frms[2])
time.sleep(2)
driver.find_element_by_xpath("(//input)[1]").send_keys("userrname")
driver.find_element_by_xpath("(//input)[2]").send_keys("password")
driver.find_element_by_xpath("//button").click()
driver.switch_to_default_content()
time.sleep(4)
#driver.close()

# specify the url
roster_page = 'http://games.espn.com/ffl/leaguerosters?leagueId=11111'
# query the website and return the html to the variable 'page'
page = urllib2.urlopen(roster_page)
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')
jbf
  • 171
  • 2
  • 11
  • Requests you're executing via Selenium in Browser has nothing common with request you're making via `urllib`. Just pass username/password to your HTTP-request to request data as authorized user (no Selenium required) or use pure Selenium job (note that Selenium has enough built-in methods for page scraping) – Andersson Sep 18 '18 at 10:20
  • 2
    To be more specific, `cookies` are not shared between Selenium and urllib2 so when you make the request using urllib2 the webserver won't be able to detect your previous login. As others have stated just stick with Selenium for all HTTP requests and you should be OK. – Ionut Ticus Sep 18 '18 at 10:25
  • @Ionut Ticus : you seem to hit the point - i have the same issue here too: https://stackoverflow.com/questions/62595317/login-to-page-with-selenium-works-parsing-with-bs4-works-but-not-the-combina any idea how to fix it!? Look forward to hear from you. regards – zero Jul 02 '20 at 23:10

2 Answers2

3

You are using selenium to login and then using urllib2 to open the URL which uses another session to goto the site. Get the source from selenium webdriver and then use it with BeautifulSoup and it should work.

Dakshinamurthy Karra
  • 5,353
  • 1
  • 17
  • 28
  • good evening dear Dakshinarmurty Karra: many thanks for steppin up the plate with this great answer: i got stuck with the exactly same issue: – cf https://stackoverflow.com/questions/62595317/login-to-page-with-selenium-works-parsing-with-bs4-works-but-not-the-combina - well i think i have to digg deeper into your ideas and approach. - look forward to hear from you. Regards zero – zero Jul 02 '20 at 23:15
0

Try this instead of urllib2

driver.get("http://games.espn.com/ffl/leaguerosters?leagueId=11111")
# query the website and return the html to the variable 'page'
page = driver.page_source
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')
anish
  • 88
  • 6