I have a 2d numpy array with repeated values in first column. The repeated values can have any corresponding value in second column.
Its easy to find the cumsum using numpy, but, I have to find the cumsum for all the repeated values.
How can we do this effectively using numpy or pandas?
Here, I have solved the problem using ineffective for-loop. I was wondering if there is a more elegant solution.
Question How can we get the same result in more effective fashion?
Help will be appreciated.
#!python
# -*- coding: utf-8 -*-#
#
# Imports
import pandas as pd
import numpy as np
np.random.seed(42)  # make results reproducible
aa = np.random.randint(1, 20, size=10).astype(float)
bb = np.arange(10)*0.1
unq = np.unique(aa)
ans = np.zeros(len(unq))
print(aa)
print(bb)
print(unq)
for i, u in enumerate(unq):
    for j, a in enumerate(aa):
        if a == u:
            print(a, u)
            ans[i] += bb[j]
print(ans)
"""
# given data
idx  col0  col1
0    7.    0.0 
1    15.   0.1
2    11.   0.2
3    8.    0.3
4    7.    0.4
5    19.   0.5
6    11.   0.6
7    11.   0.7
8    4.    0.8
9    8.    0.9
# sorted data
4.    0.8
7.    0.0
7.    0.4
8.    0.9
8.    0.3
11.   0.6
11.   0.7
11.   0.2
15.   0.1
19.   0.5
# cumulative sum for repeated serial
4.    0.8
7.    0.0 + 0.4
8.    0.9 + 0.3
11.   0.6 + 0.7 + 0.2
15.   0.1
19.   0.5
# Required answer
4.    0.8 
7.    0.4    
8.    1.2
11.   1.5
15.   0.1
19.   0.5
"""
 
    