My use case is a bit specific. I want to sample 2 items without replacement from a list/array (of 50, or 100 elements). So I don't have to worry about arrays of sizes of 10^4 or 10^5 or multidimensional data.
I want to know
- Which one,
numpy.random.choice()ornumpy.random.shuffle()is faster for this purpose, and why? - If they both produce random samples of "good quality"? That is, are both generating good random samples for my purpose, or does one produce less random samples? (Just a sanity check to make sure I am not overlooking something regarding the source codes of these functions).
For Question 1, I tried timing both functions (code below), and the shuffle method seems to about 5-6 times faster. Any insight you can give about this is most welcome. If there are faster ways to achieve my purpose, I will be glad to hear about them (I had looked at the options of python random module, but the fastest method from my testing was using np.random.shuffle()).
def shuffler(size, num_samples):
items = list(range(size))
np.random.shuffle(items)
return items[:num_samples]
def chooser(size, num_samples):
return np.random.choice(size, num_samples, replace=False)
%timeit shuffler(50, 2)
#> 1.84 µs ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit chooser(50, 2)
#> 13 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You may think it's already optimized and I am wasting time trying to save pennies. But np.random.choice() is called 5000000 times in my code and takes about 8% of my runtime. It is being used in a loop to obtain 2 random samples from the population for each iteration.
Pseudocode:
for t in range(5000000):
# Random sample of 2 from the population without replacement.
If there is a smarter implementations for my requirement, I am open to suggestions.
PS: I am aware that shuffle performs in place operation, but as I just require the indices of the two random elements I do not essentially have to perform it on my original array. There are other questions that compares the two functions from python random module. But I require 2 samples without replacement.