When using pytables, there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g.
def store_sparse_matrix(self):
grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M')
self.getFileHandle().createArray(grp1, 'data', M.tocsr().data)
self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr)
self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices)
def get_sparse_matrix(self):
return sparse.csr_matrix((self.getGroup().M.data, self.getGroup().M.indices, self.getGroup().M.indptr))
The trouble is that the get_sparse function takes some time (reading from disk), and if I understand it correctly also requires the data to fit into memory.
The only other option seems to convert the matrix to dense format (numpy array) and then use pytables normally. However this seems to be rather inefficient, although I suppose perhaps pytables will deal with the compression itself?