- Go to this url https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15 (username = TrickyBen | password = TrickyBen123)
- Notice that there is a Download Excel button (in Red)
- I want to download the excel file and turn it into a pandas dataframe. I want to do it programatically (ie. from the script, not by manually clicking around the website). How would I do this?
This code will get you logged in as TrickyBen, and make a request to the website API...
import requests from lxml import html from requests import Session import pandas as pd import shutil
raceSession = Session()
LoginDetails = {'login': 'TrickyBen', 'password': 'TrickyBen123'}
LoginUrl = 'https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15/horsebase1.php'
LoginPost = raceSession.post(LoginUrl, data=LoginDetails)
RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails =  {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}
PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
Table = pd.read_table(Response.text)
Table.to_csv('blahblah.csv')
If you inspect element, you'll notice that the relevant element looks like this...
<form action="excelresults.php" method="post">
    <input type="hidden" name="user" value="41495">
    <input type="hidden" name="racedate" value="2005-3-15">
    <input type="submit" class="downloadbutton" value="Excel">
</form>
I get this error message...
Traceback (most recent call last):
  File "/Users/Alex/Desktop/DateTest/hrpull.py", line 20, in <module>
    Table = pd.read_table(Response.text)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
    self._make_engine(self.engine)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
  File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File race_date race_time   track   race_name       race_restrictions_age   race_class  major   race_distance   prize_money     going_description   number_of_runners   place   distbt  horse_name  stall       trainer horse_age   jockey_name jockeys_claim   pounds  odds    fav     official_rating comptime    TotalDstBt  MedianOR    Dist_Furlongs       placing_numerical   RCode   BFSP    BFSP_Place  PlcsPaid    BFPlcsPaid      Yards   RailMove    RaceType    
"2005-03-15"    "14:00:00"  "Cheltenham"    "Letheby & Christopher Supreme Novices Hurdle " "4yo+"  "Class 1"   "Grade 1"   "2m˝f " "58000" "Good"  "20"    "1st"       "Arcalis"   "0" "Johnson, J Howard" "5" "Lee, G"    "0" "161"   "21"        "136"   "3 mins 53.00s"     "121.5" "16.5"  "1" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
"2005-03-15"    "14:00:00"  "Cheltenham"    "Letheby & Christopher Supreme Novices Hurdle " "4yo+"  "Class 1"   "Grade 1"   "2m˝f " "58000" "Good"  "20"    "2nd"   "6" "Wild Passion (GER)"    "0" "Meade, Noel"   "5" "Carberry, P"   "0" "161"   "11"        "0" "3 mins 53.00s" "6" "121.5" "16.5"  "2" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
 
     
    
