What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list?
Edit: Thank You all for give me the answer, now I understand.
What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list?
Edit: Thank You all for give me the answer, now I understand.
O(1). Or to be more precise, it's equivalent to the big-O random access time for looking up a single index in whatever sequence you pass it, and list has O(1) random access indexing (as does tuple). Simplified, all it does is seq[random.randrange(len(seq))], which is obviously equivalent to a single index lookup operation.
An example where it would be O(n) is collections.deque, where indexing in the middle of the deque is O(n) (with a largish constant divisor though, so it's not that expensive unless the deque is reaching the thousands of elements range or higher). So basically, don't use a deque if it's going to be large and you plan to select random elements from it repeatedly, stick to list, tuple, str, byte/bytearray, array.array and other sequence types with O(1) indexing.
Though the question is about random.choice and previous answers on it have several explanations, when I searched for the complexity of np.random.choice, I didn't find an answer, so I decide to explain about np.random.choice.
choice(a, size=None, replace=True, p=None). Assume a.shape=(n,) and size=m.
When with replacement:
The complexity for np.random.choice is O(m) if p is not specified (assuming it as uniform distribution), and is O(n + n log m ) if p is specified.
The github code can be find here np.random.choice.
When p is not specified, choice generates an index array by randint and returns a[index], so the complexity is O(m). (I assume the operation of generating a random integer by randint is O(1).)
When p is specified, the function first computes prefix sum of p. Then it draws m samples from [0, 1), followed by using binary search to find a corresponding interval in the prefix sum for each drawn sample. The evidence to use binary search can be found here. So this process is O(n + m log n). If you need a faster method in this situation, you can use Alias Method, which needs O(n) time for preprocessing and O(m) time to sample m items.
When without replacement: (It's kind of complicated, and maybe I'll finish it in the future.)
If p is not specified, the complexity is the same as np.permutation(n), even when m is only 1. See more here.
If p is specified, the expected complexity is at least $n \log n \log\frac{n}{n + 1 - m}$. (This is an upperbound, but not tight.)
The complexity of random.choice(list) is O(log n) where n is the number of elements in the list.
The cpython implementation uses _randbelow(len(seq)) to get a pseudo-random index and then returns the item at that index.
The bottleneck is the _randbelow() function which uses rejection sampling to generate a number in the range [0, n). The function generates k pseudo-random bits with a call to getrandbits(k) where k is ceil(log N). These bits represent a number in the range [0, 2**k). This process is repeated until the generated number is less than n. Each call to the pseudo-random number generator runs in O(k) where k is the number of bits generated which is O(log n).
I think the above answer is incorrect. I empirically verified that the complexity of this operation is O(n). Here is my code and a little plot. I am not sure about the theory though.
from time import time
import numpy as np
import matplotlib.pyplot as plt
N = np.logspace(2, 10, 40)
output = []
for i, n in enumerate(N):
print(i)
n = int(n)
stats = time()
A = np.random.choice(list(range(n)), n//2)
output.append(time()-stats)
plt.plot(N, output)