Why is csr_matrix slice A[index,:]so fast in scipy?

Question

Now,I need to realize sparse matrix slice in C++ using Eigen3.3.8.But I can't realize it as fast as scipy in python.
for example,A is a sparse matrix of shape(1000000,1000000),index is a list I wanted to choose from A.
When I write as this:

A=A[index,:]

It spend 0.5114s.
When I write as this:

def extractor(indices,N)::
    indptr=np.arange(len(indices)+1)
    data=np.ones(len(indices))
    shape=(len(indices),N)
    return sparse.csr_matrix((data,indices,indptr), shape=shape)
A=extractor(index,A.shape[0])*A

it spend 3.381s.(this idea is from Sparse matrix slicing using list of int).
So How can I realize fast csr matrix slice?

When I tested my code 5 years ago it was just as fast. But that was with a smaller test case. I don't know of any changes in scipy. — hpaulj, Jun 17 '22 at 14:57
Matrix multiplication is a good way to extract columns from a CSR matrix as it's basically identical in how you have to iterate over elements for a column slice, and matrix multiplications tend to be well optimized. It's less efficient for rows because you can just copy those directly using the pointer array as a guide, which is what the CSR C extensions in scipy are doing. That is the best speed you can reasonably hope to get. That said, you might be able to get faster with a parallelized sparse matrix multiplication (scipy does not parallelize sparse math). — CJR, Jun 17 '22 at 19:44
What does this have to do with Eigen? Do you want to know how to convert this to Eigen in C++? If so, are you still using CSR format or do you use CSC because Eigen is column-major by default? — Homer512, Jun 18 '22 at 09:33
I want to realize fast slice in Eigen,so I try to copy the method of A[index,:] in scipy. — zyg, Jun 18 '22 at 14:28
Correct me if I'm wrong but the two Python versions above are not exactly equivalent. The second preserves the original matrix shape, the first changes the number of rows and the row indices of the selected values. Which behavior do you want? — Homer512, Jun 18 '22 at 16:41
https://github.com/scipy/scipy/blob/main/scipy/sparse/sparsetools/csr.h here's the specific implementation of the csr row indexing in scipy (`csr_row_index`). CSC will be identical for column indexing, just with the major axis flipped. — CJR, Jun 18 '22 at 17:42
To Homer512: shape=(len(indices),N),so I think the second is not original shape,original shape is (N,N). — zyg, Jun 23 '22 at 06:40

score 1 · Answer 1 · answered Jun 17 '22 at 16:30

With a more modest sized matrix, my extractor code is slower, but only by 2x.

In [4]: M = sparse.random(1000,1000,.1,'csr')

In [5]: M
Out[5]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100000 stored elements in Compressed Sparse Row format>

In [6]: idx = np.arange(100)

In [7]: M[idx,:]
Out[7]: 
<100x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 10116 stored elements in Compressed Sparse Row format>

In [8]: timeit M[idx,:]
173 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The extractor:

In [10]: def extractor(indices,N):
    ...:     indptr=np.arange(len(indices)+1)
    ...:     data=np.ones(len(indices))
    ...:     shape=(len(indices),N)
    ...:     return sparse.csr_matrix((data,indices,indptr), shape=shape)

The nnz match:

In [13]: extractor(idx, M.shape[0])*M
Out[13]: 
<100x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 10116 stored elements in Compressed Sparse Row format>

Somewhat slower, but not as much as in your example.

In [14]: timeit extractor(idx, M.shape[0])*M
302 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is larger than I tested 5yrs ago.

Since I have other things to do, I'm not going to try to test other cases - larger M and longer idx.

You are welcome to explore the sparse code to see if they have changed the calculation, may be streamlining it in various ways that my reconstruction doesn't capture.

Homer512 · Answer 2 · 2022-06-20T09:36:25.943

As I've previously noted in a comment, the two methods you list are not equivalent. A[index] with a list of indices changes the shape of the output. It can also permute or replicate rows. This can be realized reasonably fast like this:

using CsrMatrixD = Eigen::SparseMatrix<double, Eigen::RowMajor>;

CsrMatrixD rowsFromCsr(const Eigen::Ref<const CsrMatrixD>& in,
                       const Eigen::Ref<const Eigen::VectorXi>& indices)
{
  using Triplet = Eigen::Triplet<double>;
  using InnerIterator = Eigen::Ref<const CsrMatrixD>::InnerIterator;
  std::vector<Triplet> triplets;
  int outrow = 0;
  for(int row: indices) {
    for(InnerIterator nonzero(in, row); nonzero; ++nonzero)
      triplets.emplace_back(outrow, nonzero.col(), nonzero.value());
    ++outrow;
  }
  CsrMatrixD rtrn(outrow, in.cols());
  rtrn.setFromTriplets(triplets.begin(), triplets.end());
  return rtrn;
}

This runs faster if the indices are sorted but should still work with any other arrangement. The InnerIterator may be new in Eigen-3.4.

The second option, where you multiply with a diagonal matrix, preserves the original shape and order of elements. It can be done like this:

CsrMatrixD rowsFromCsr(const Eigen::Ref<const CsrMatrixD>& in,
                       const Eigen::Ref<const Eigen::VectorXi>& indices)
{
    Eigen::VectorXd diag = Eigen::VectorXd::Zero(in.rows());
    for(int i: indices)
      diag[i] = 1.;
    return diag.asDiagonal() * in;
}

This is still reasonably fast. Sadly there doesn't seem to be a version that uses a sparse vector for the diagonal. Using the prune method is faster in my tests:

CsrMatrixD rowsFromCsr(const Eigen::Ref<const CsrMatrixD>& in,
                       const Eigen::Ref<const Eigen::VectorXi>& indices)
{
  std::vector<bool> bitmap(in.rows());
  for(int row: indices)
    bitmap[row] = true;
  CsrMatrixD rtrn = in;
  rtrn.prune([&bitmap](int row, int /*col*/, double /*value*/) noexcept -> bool {
    return bitmap[row];
  });
  rtrn.makeCompressed();
  return rtrn;
}

Why is csr_matrix slice A[index,:]so fast in scipy?

2 Answers2