I'm writing a machine learning algorithm on huge & sparse data (my matrix is of shape (347, 5 416 812 801) but very sparse, only 0.13% of the data is non zero.
My sparse matrix's size is 105 000 bytes (<1Mbytes) and is of csr type.
I'm trying to separate train/test sets by choosing a list of examples indices for each. So I want to split my dataset in two using :
training_set = matrix[train_indices]
of shape (len(training_indices), 5 416 812 801), still sparse
testing_set = matrix[test_indices]
of shape (347-len(training_indices), 5 416 812 801) also sparse
With training_indices and testing_indices two list of int
But training_set = matrix[train_indices] seems to fail and return a Segmentation fault (core dumped)
It might not be a problem of memory, as I'm running this code on a server with 64Gbytes of RAM.
Any clue on what could be the cause ?