Suppose I have two numpy arrays, A and B, which are potentially very large. I'd like to find a way to efficiently add values to certain entries in A by specifying the indices at which each entry of B should be added.
Normally, one could use the following syntax:
A[indices] += B
The problem is that this doesn't behave as I would expect in cases where indices contains duplicate values. The only solution I've found is to use a manual for-loop, but I was hoping there might be a more efficient way. For example:
A = np.array([100, 200, 300, 400])
B = np.array([1, 2, 3, 4, 5, 6])
indices = [1, 2, 0, 2, 1, 1]
for i, index in enumerate(indices):
A[index] += B[i]
This yields A = [103, 212, 306, 400], as desired.
In contrast, A[indices] += B yields A = [103, 206, 304, 400], which suggests that the operations A[1] += 1, A[1] += 5, A[2] += 2 are being omitted.
Note: I view the desired behavior as being somewhat similar to a "group by" operation in SQL -- for each value of k, I want to group all entries of B where indices == k and add them into the kth position of A.
My question is: is there a more efficient way to perform this operation? I'm hoping there's some built-in numpy functionality which would be better-optimized for performance than my for-loop above.
For reference, I'm using numpy version 1.13.3.
Higher-dimensional case
If it's possible to generalize this to higher-dimensional arrays, I'd be interested to hear that, too. For example, is there a more efficient way to perform the following?
A = (1 + np.arange(12).reshape(3,4)) * 100
B = (1 + np.arange(18)).reshape(3,6)
row_indices = [2, 0, 2]
col_indices = [1, 2, 0, 2, 1, 1]
for i, row_index in enumerate(row_indices):
for j, col_index in enumerate(col_indices):
A[row_index, col_index] += B[i, j]