I have a problem and a question. This URL - www.listindiario.com - has a redirect and I can't scrape it using BeautifulSoup webscraping. It has a redirect to the root and I don't know how to do webscraping on the home page since it always redirects and urllib2 fails.
I want to access the home page and not the splash page. Any suggestions?
I understand that the code is not optimized , but I just want to know how to skip that redirect.
key = 'la'
htmlfile_test = urllib2.Request('http://www.listindiario.com', headers=hdr)
try:
    htmlfile = urllib2.urlopen(htmlfile_test)
    soup = bs4(htmlfile)
    print soup
except URLError as e:
    if hasattr(e, 'reason'):
        print 'Dificultad para encontrar respuesta del server.'
    if responses.has_key(e.code):
        print 'Razon: ', responses[e.code]
    elif hasattr(e, 'code'):
        print 'El servidor no puede completar la respuesta.'
        print 'Codigo de error : ', e.code
    else:
        print 'URL: ', htmlfile.geturl()
        for resultado in soup.find_all('a', href=True, text=re.compile(key)):
            print "Encontrado ! <>", resultado['href']
 
     
    
Object moved to here.
......Why using urllib2 requets instead ? – papabomay Nov 27 '14 at 19:31