I am trying to get a fast vectorized version of the following loop:
for i in xrange(N1):
A[y[i]] -= B[i,:]
Here A.shape = (N2,N3), y.shape = (N1) with y taking values in [0,N2[, B.shape = (N1,N3). You can think of entries of y being indices into rows of A. Here N1 is large, N2 is pretty small and N3 is smallish.
I thought simply doing
A[y] -= B
would work, but the issue is that there are repeated entries in y and this does not do the right thing (i.e., if y=[1,1] then A[1] is only added to once, not twice). Also this is does not seem to be any faster than the unvectorized for loop.
Is there a better way of doing this?
EDIT: YXD linked this answer to in comments which at first seems to fit the bill. It would seem you can do exactly what I want with
np.subtract.at(A, y, B)
and it does work, however when I try to run it it is significantly slower than the unvectorized version. So, the question remains: is there a more performant way of doing this?
EDIT2: An example, to make things concrete:
n1,n2,n3 = 10000, 10, 500
A = np.random.rand(n2,n3)
y = np.random.randint(n2, size=n1)
B = np.random.rand(n1,n3)
The for loop, when run using %timeit in ipython gives on my machine:
10 loops, best of 3: 19.4 ms per loop
The subtract.at version produces the same value for A in the end, but is much slower:
1 loops, best of 3: 444 ms per loop