How to improve np.random.choice() looping efficiency

Question

I am trying to apply np.random.choice to a big array with different weights, and wondering any way could avoid looping and improve the performance? Over here len(weights) could be millions.

weights = [[0.1, 0.5, 0.4],
           [0.2, 0.4, 0.4],
           ...
           [0.3, 0.3, 0.4]]

choice = [1, 2, 3]
ret = np.zeros((len(weights), 20))
for i in range(len(weights)):
    ret[i] = np.random.choice(choice, 20, p=weights[i])

I am afraid that you can't help but looping. Perhaps, you could still speed things up by using `numba`? One thing that could help is, if you have a lot of `weights` duplicates, you could generate those with a single `np.random.choice()` call and put it in the right `ret` place afterwards. Also, you may want to change: `for i in range(len(weights)): ...` into `for i, weight in enumerate(weights): ...` (but that's not going to buy you speed, it is just stylistic good practice). — norok2, Jul 28 '19 at 03:07
@WarrenWeckesser How would they incorporate number of items in the linked one? — Divakar, Jul 28 '19 at 06:57
@Divakar, well, you know, broadcasting, etc. :) OK, it isn't an exact duplicate. — Warren Weckesser, Jul 28 '19 at 07:24
That is the shape of 'weights'? It may be useful for benchmarking — tstanisl, Jul 28 '19 at 07:40

Warren Weckesser · Accepted Answer · 2019-07-28T19:29:17.400

Here's a generalization of my answer in Fast random weighted selection across all rows of a stochastic matrix:

def vectorized_choice(p, n, items=None):
    s = p.cumsum(axis=1)
    r = np.random.rand(p.shape[0], n, 1)
    q = np.expand_dims(s, 1) >= r
    k = q.argmax(axis=-1)
    if items is not None:
        k = np.asarray(items)[k]
    return k

p is expected to be a two-dimensional array whose rows are probability vectors. n is the number of samples to draw from the distribution defined by each row. If items is None, the samples are integers in range(0, p.shape[1]). If items is not None, it is expected to be a sequence with length p.shape[1].

Example:

In [258]: p = np.array([[0.1, 0.5, 0.4], [0.75, 0, 0.25], [0, 0, 1], [1/3, 1/3, 1/3]])                                                   

In [259]: p                                                                                                                              
Out[259]: 
array([[0.1       , 0.5       , 0.4       ],
       [0.75      , 0.        , 0.25      ],
       [0.        , 0.        , 1.        ],
       [0.33333333, 0.33333333, 0.33333333]])

In [260]: vectorized_choice(p, 20)                                                                                                       
Out[260]: 
array([[1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 0, 1, 2, 2, 2],
       [0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [1, 0, 2, 2, 0, 1, 2, 1, 0, 0, 0, 0, 2, 2, 0, 0, 2, 1, 1, 2]])

In [261]: vectorized_choice(p, 20, items=[1, 2, 3])                                                                                      
Out[261]: 
array([[2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 2, 2],
       [1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 1, 1, 3, 3, 1, 3, 1, 1, 1],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [3, 3, 3, 1, 3, 2, 1, 2, 3, 1, 2, 2, 3, 2, 1, 2, 1, 2, 2, 2]])

Timing for p with shape (1000000, 3):

In [317]: p = np.random.rand(1000000, 3)

In [318]: p /= p.sum(axis=1, keepdims=True)

In [319]: %timeit vectorized_choice(p, 20, items=np.arange(1, p.shape[1]+1))
1.89 s ± 28.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Here's the timing for Divakar's function:

In [320]: %timeit random_choice_prob_vectorized(p, 20, choice=np.arange(1, p.shape[1]+1))
7.33 s ± 43.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The difference will be less pronounced if you increase the number of columns in p, and if you make the number of columns big enough, Divakar's function will be faster. E.g.

In [321]: p = np.random.rand(1000, 120)

In [322]: p /= p.sum(axis=1, keepdims=True)

In [323]: %timeit vectorized_choice(p, 20, items=np.arange(1, p.shape[1]+1))
6.41 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [324]: %timeit random_choice_prob_vectorized(p, 20, choice=np.arange(1, p.shape[1]+1))
6.29 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Updated my solution. Not as good as yours, but much better than my previous one. Could you take the trouble of updating the timings? Thanks! — Divakar, Jul 28 '19 at 09:11
@Divakar, OK, but in my tests, the new version is actually a bit slower than the older version. :( — Warren Weckesser, Jul 28 '19 at 14:47
Thanks, but in vectorized_choice, prob_matrix should be p, right? — Nick, Jul 28 '19 at 19:07

Divakar · Answer 2 · 2019-07-28T09:09:37.193

Borrowing idea from Vectorizing numpy.random.choice for given 2D array of probabilities along an axis alongwith idea from vectorized searchsorted, here's one vectorized way -

def random_choice_prob_vectorized(weights, num_items, choice=None):
    weights = np.asarray(weights)

    w = weights.cumsum(1)
    r = np.random.rand(len(weights),num_items)

    m,n = w.shape
    o = np.arange(m)[:,None]
    w_o = (w+o).ravel()
    r_o = (r+o).ravel()
    idx = np.searchsorted(w_o,r_o).reshape(m,-1)%n
    if choice is not None:
        return np.asarray(choice)[idx]
    else:
        return idx

Sample run to verify using 2D bincount -

In [28]: weights = [[0.1, 0.5, 0.4],
    ...:            [0.2, 0.4, 0.4],
    ...:            [0.3, 0.3, 0.4]]
    ...: 
    ...: choice = [1, 2, 3]
    ...: num_items = 20000

In [29]: out = random_choice_prob_vectorized(weights, num_items, choice)

# Use 2D bincount to get per average occurences and verify against weights
In [75]: bincount2D_vectorized(out)/num_items
Out[75]: 
array([[0.     , 0.09715, 0.4988 , 0.40405],
       [0.     , 0.1983 , 0.40235, 0.39935],
       [0.     , 0.30025, 0.29485, 0.4049 ]])

score 0 · Answer 3 · answered Jul 28 '19 at 05:04

Looks like each row of the resulting array is independent of other rows. I am not sure how bad is the performance now. If it really is a concern, I would try to use python's multiprocessing module to run random number generations with several processes in parallel. It should help.

How to improve np.random.choice() looping efficiency

3 Answers3

Linked