I prepared this answer for this question, which was erroneously marked as a duplicate of this one.
The best method for speed is not the best method for portability or fidelity. Pickle is fast and faithful, but not portable or archival safe. HFD is portable and archival safe, but is slower and can only store DataFrames with certain formats and structures.
Summary:
- For sharing and archiving of simple tables, where some changes in fomat are tolerable:
csv, excel, or json, depending on your application.
- For perfect save-and-restore, but no portability or archival safety:
pickle
- For archiving:
hdf, but not all tables can be saved portably or losslessly in the format. You may need to restructure things and convert some types.
Details: We'd like a method that pandas already supports with both .to_format method in the DataFrame class and a read_format method in the pandas module. In Pandas 1.5.2 these are csv, excel, feather, gbq, hdf, html, json, orc, parquet, pickle, sql, stata, xml.
- The formats
excel and csv are highly portable and nice for simple tables. Complicated tables and datastructures won't survive the round trip.
json is also highly portable, but will change the data in the table. NaNs will be converted to None, numpy arrays may convert to nested lists, etc.
- I'll skip
feather, gbq, orc, parquet, sql, and stata. These are specific formats not wholly compatible with the DataTable format. They are either not very portable, or not very flexible. I'll also skip html, it can't faithfully save and restore all of the details of a DataFrame.
pickle is the easiest to use for a faithful save/restore. However, it is not portable and not archival safe. Expect pickle files to fail to load correctly in future versions.
- This leaves
hdf. This should be an achival safe and highly portable format. Many scientific applications read or store hdf files. However, python will still need to pickle any dataframe contents that can't be converted to ctypes.