Suppose I have two arrays:
- xwhich contains- mpoints;
- cwhich contains- mcluster ids for each corresponding point from- x.
I want to calculate the mean value for points which share the same id, i.e. which belong to the same cluster. I know that c contains integers from the range [0, k) and all the values are present in the c.
My current solution looks like the following:
import numpy as np
np.random.seed(42)
k = 3
x = np.random.rand(100, 2)
c = np.random.randint(0, k, size=x.shape[0])
mu = np.zeros((k, 2))
for i in range(k):
    mu[i] = x[c == i].mean(axis=0)
While this approach works, I'm wondering if there is a more efficient way to calculate the means in NumPy without having to use an explicit for loop?
 
    