In [28]: df = DataFrame({ 'A' : np.random.rand(5),
'B' : range(5),
'C' : date_range('20130101',periods=5,freq='T')})
In [29]: df
Out[29]:
A B C
0 0.067509 0 2013-01-01 00:00:00
1 0.872840 1 2013-01-01 00:01:00
2 0.379634 2 2013-01-01 00:02:00
3 0.552827 3 2013-01-01 00:03:00
4 0.996150 4 2013-01-01 00:04:00
[5 rows x 3 columns]
In [30]: df.dtypes
Out[30]:
A float64
B int64
C datetime64[ns]
dtype: object
Write out a Table format.
In [32]: df.to_hdf('test.h5','df',mode='w',format='table')
Show the internal structure of the file
In [33]: !ptdump -avd test.h5
/ (RootGroup) ''
/._v_attrs (AttributeSet), 4 attributes:
n [32]: df.to_hdf('test.h5','df',mode='w',format='table')
In [33]: !ptdump -avd test.h5
/ (RootGroup) ''
/._v_attrs (AttributeSet), 4 attributes:
[CLASS := 'GROUP',
PYTABLES_FORMAT_VERSION := '2.1',
TITLE := '',
VERSION := '1.0']
/df (Group) ''
/df._v_attrs (AttributeSet), 14 attributes:
[CLASS := 'GROUP',
TITLE := '',
VERSION := '1.0',
data_columns := [],
encoding := None,
index_cols := [(0, 'index')],
info := {1: {'type': 'Index', 'names': [None]}, 'index': {}},
levels := 1,
nan_rep := 'nan',
non_index_axes := [(1, ['A', 'B', 'C'])],
pandas_type := 'frame_table',
pandas_version := '0.10.1',
table_type := 'appendable_frame',
values_cols := ['values_block_0', 'values_block_1', 'values_block_2']]
/df/table (Table(5,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
"values_block_2": Int64Col(shape=(1,), dflt=0, pos=3)}
byteorder := 'little'
chunkshape := (2048,)
autoindex := True
colindexes := {
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False}
/df/table._v_attrs (AttributeSet), 19 attributes:
[CLASS := 'TABLE',
FIELD_0_FILL := 0,
FIELD_0_NAME := 'index',
FIELD_1_FILL := 0.0,
FIELD_1_NAME := 'values_block_0',
FIELD_2_FILL := 0,
FIELD_2_NAME := 'values_block_1',
FIELD_3_FILL := 0,
FIELD_3_NAME := 'values_block_2',
NROWS := 5,
TITLE := '',
VERSION := '2.7',
index_kind := 'integer',
values_block_0_dtype := 'float64',
values_block_0_kind := ['A'],
values_block_1_dtype := 'int64',
values_block_1_kind := ['B'],
values_block_2_dtype := 'datetime64',
values_block_2_kind := ['C']]
Data dump:
[0] (0, [0.06750856214219292], [0], [1356998400000000000])
[1] (1, [0.8728395428343044], [1], [1356998460000000000])
[2] (2, [0.37963409103250334], [2], [1356998520000000000])
[3] (3, [0.5528271410494643], [3], [1356998580000000000])
[4] (4, [0.9961498806897623], [4], [1356998640000000000])
datetime64[ns] are serialized to nanoseconds since epoch in UTC and stored as an int64 column type (this is the same as numpy stores the underlying data). So its pretty straightforward to read this in as it is standard HDF5 format. You would need, however, to interpret the meta data. See the source file in pandas/io/pytables.py.
Basically you would look for datetime64 kind blocks (the kind maps the names of those coulmns). Then you can reverse convert in IDL/matlab (in pandas you would do pd.to_datetime(ns_since_epoch,unit='ns'). Timezones are a bit more tricky as the values are UTC, and the timezone is stored in the info attribute.
Note: this is slightly different in the interpretation of the meta-data for a Fixed format or if you have data_columns (but not very difficult to do).