Let's say I have the following Excel file to be read:
What I want is a simple solution (preferrably one-line) that can read the excel so that the dates are converted to str (or at least int), and the blank values are to nan or nat or whatever can be detected by pd.isnull.
If I use df = pd.read_excel(file_path), what I get is
df
Out[8]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203.0 20150605 NaN 20090501.0
delist_date NaN 20170801 NaN NaN
So pandas recognised blank cells as NaN, which is fine, but the pet peeve is that all the other values are forced to float64, even if they are intended to be just str or ints. (edit: it seems that if a column, e.g. the column [1], has no nans, then the other values won't be forced to float. However, in my case most columns have delist_date blank, since most stocks have an ipo date but are not delisted yet.)
For what I know though, I tried the dtype=str keyword arg, and it gives me
df
Out[10]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203 20150605 nan 20090501
delist_date nan 20170801 nan nan
Looks good? True, the dates are now str, but one thing ridiculous is that the nans now become literal strings! E.g.
df.iloc[1, 0]
Out[12]:
'nan'
which would make me have to add something weird like df.replace later on.
I didn't try the converters because it would require specifying datatype column by column, and the actual excel file I'm working with is a very long spreadsheet (3k columns approx). I don't want to transpose the spreadsheet in excel itself either.
Could anybody help? Thanks in advance.
