I am working with binary (only 0's and 1's) matrices of rows and columns in the order of a few thousands. For example, the number of rows are between 2000 - 7000 and number of columns are between 4000 - 15000. My computer has more then 100g RAM.
I'm surprised that even with these sizes, I am getting MemoryError with the following code. For reproducibility, I'm including an example with a smaller matrix (10*20) Note than both of the following raise this error:
   import numpy as np
   my_matrix = np.random.randint(2,size=(10,20))
   tr, tc = np.triu_indices(my_matrix.shape[0],1)
   ut_sums = np.sum(my_matrix[tr] * my_matrix[tc], 1)
   denominator = 100
   value = 1 - ut_sums.astype(float)/denominator
   np.einsum('i->', value)
I tried to replace the elementwise multiplication in the above code to einsum as below, but it also generates the same MemoryError:
   import numpy as np
   my_matrix = np.random.randint(2,size=(10,20))
   tr, tc = np.triu_indices(my_matrix.shape[0],1)
   ut_sums = np.einsum('ij,ij->i', my_matrix[tr], my_matrix[tc])
   denominator = 100
   value = 1 - ut_sums.astype(float)/denominator
   np.einsum('i->', value)
In both cases, the printed Traceback points to the line where ut_sums is being calculated.
Please note that my code has other operations too, and there are other statistics calculated on matrices of similar sizes, but with more than 100 g, I thought it should not be a problem.
 
    