I am new to python and even newer to pandas, but relatively well versed in R. I am using Anaconda, with Python 3.5 and pandas 0.18.1. I am trying to read in an excel file as a dataframe. The file admittedly is pretty... ugly. There is a lot of empty space, missing headers, etc. (I am not sure if this is the source of any issues)
I create the file object, then find the appropriate sheet, then try to read that sheet as a dataframe:
xl = pd.ExcelFile(allFiles[i])
sName = [s for s in xl.sheet_names if 'security exposure' in s.lower()]
df = xl.parse(sName)
df
Results:
{'Security exposure - 21 day lag':          Percent of Total Holdings  \
 0      KMNFC vs. 3 Month LIBOR AUD   
 1                        04-OCT-16   
 2                Australian Dollar   
 3                              NaN   
 4                              NaN   
 5                              NaN   
 6                              NaN   
 7                              NaN   
 8          Long/Short Net Exposure   
 9                            Total   
 10                             NaN   
 11                            Long   
 12                             NaN   
 13                             NaN   
 14                             NaN   
 15                             NaN   
 16                             NaN   
 17                             NaN  
(This goes on for 20-30 more rows and 5-6 more columns)
I am using Anaconda, and Spyder, which has a 'Variable Explorer'. It shows the variable df to be a dict of the DataFrame type:
However, I cannot use iloc:
df.iloc[:,1]
Traceback (most recent call last):
  File "<ipython-input-77-d7b3e16ccc56>", line 1, in <module>
    df.iloc[:,1]
AttributeError: 'dict' object has no attribute 'iloc'
Any thoughts? What am I missing?
EDIT:
To be clear, what I am really trying to do is reference the first column of the df. In R this would be df[,1]. Looking around it seems to be not a very popular way to do things, or not the 'correct' way. I understand why indexing by column names, or keys, is better, but in this situation, I really just need to index the dataframes by column numbers. Any working method of doing that would be greatly appreciated.
EDIT (2):
Per a suggestion, I tried 'read_excel', with the same results:
df = pd.ExcelFile(allFiles[i]).parse(sName)
df.loc[1]
Traceback (most recent call last):
  File "<ipython-input-90-fc40aa59bd20>", line 2, in <module>
    df.loc[1]
AttributeError: 'dict' object has no attribute 'loc'
df = pd.read_excel(allFiles[i], sheetname = sName)
df.loc[1]
Traceback (most recent call last):
  File "<ipython-input-91-72b8405c6c42>", line 2, in <module>
    df.loc[1]
AttributeError: 'dict' object has no attribute 'loc'

 
     
    