When reading a large hdf file with pandas.read_hdf() I get extremely slow read time. My hdf has 50 million rows, 3 columns with integers and 2 with strings. Writing this using to_hdf() with table format and indexing took almost 10 minutes. While this is also slow, I am not too concerned as read speed is more important.
I have tried saving as fixed/table format, with/without compression, however the read time ranges between 2-5 minutes. By comparison, read_csv() on the same data takes 4 minutes.
I have also tried to read the hdf using pytables directly. This is much faster at 6 seconds and this would be the speed I would like to see.
h5file = tables.open_file("data.h5", "r")
table = h5file.root.data.table.read()
I noticed all the speed comparisons in the documentation use only numeric data and running these myself achieved similar performance.
I would like to ask whether there is something I can do to optimise read performance?
Edit
Here is a sample of the data
               col_A     col_B    col_C     col_D                 col_E
30649671  1159660800  10217383        0  10596000                LACKEY
26198715  1249084800   0921720        0         0           KEY CLIFTON
19251910   752112000   0827092      104    243000                WEMPLE
47636877  1464739200  06247715        0         0                 FLOYD
14121495  1233446400  05133815        0    988000        OGU ALLYN CH 9
41171050  1314835200  7C140009        0     39000             DEBERRY A
45865543  1459468800   0314892       76    254000               SABRINA
13387355   970358400  04140585       19   6956000              LA PERLA
4186815    849398400  02039719        0  19208000  NPU UNIONSPIELHAGAN1
32666568   733622400  10072006        0   1074000                 BROWN
And info on the dataframe:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52046850 entries, 0 to 52046849
Data columns (total 5 columns):
col_A        int64
col_B        object
col_C        int64
col_D        int64
col_E        object
dtypes: int64(3), object(2)
memory usage: 1.9+ GB