I have the following statement in Pandas that uses the apply method which can take up to 2 minutes long.
I read that in order to optimize the speed. I should vectorize the statement. My original statement looks like this:
output_data["on_s"] = output_data["m_ind"].apply(lambda x: my_matrix[x, 0] + my_matrix[x, 1] + my_matrix[x, 2]
Where my_matrix is spicy.sparse matrix. So my initial step was to use the sum method:
summed_matrix = my_matrix.sum(axis=1)
But then after this point I get stuck on how to proceed.
Update: Including example data
The matrice looks like this (scipy.sparse.csr_matrix):
(290730, 2)     0.3058016922838267
(290731, 2)     0.3390328430763723
(290733, 2)     0.0838999800585995
(290734, 2)     0.0237008960604337
(290735, 2)     0.0116864263235209
output_data["m_ind"] is just a Pandas series that has come values like so:
97543
97544
97545
97546
97547
 
    