I was writing a simple Sieve of Eratosthenes in which one produces a list of all of primes up to some number N without performing any division or modular arithmetic. Abstractly, my implementation uses an array of N boolean values which all start out False and are eventually flipped to True over the course of the algorithm.
I wanted to know if this would be faster and/or use less memory if I implemented it as a list of 0 and 1, a list of True and False, or a bytearray of 0 and 1.
Timing (python 2.7)
Using python 2.7, I found the following when using N = 10k and N = 30M:
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list(3000000)'
10 loops, best of 3: 1.42 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_byte(3000000)'
10 loops, best of 3: 1.23 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list2(3000000)'
10 loops, best of 3: 1.65 sec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list(10000)'
500 loops, best of 3: 3.59 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_byte(10000)'
500 loops, best of 3: 4.12 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list2(10000)'
500 loops, best of 3: 4.25 msec per loop
10k 3M
byte (01) 4.12 ms 1.23 s
list (01) 3.59 ms 1.42 s
list (TF) 4.25 ms 1.65 s
What surprises me is that for small values of N, the list of integers was the best, and for large values of N, the bytearray was the best. The list of True and False was always slower.
Timing (python 3.3)
I also repeated the test in python 3.3:
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list(3000000)'
10 loops, best of 3: 2.05 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_byte(3000000)'
10 loops, best of 3: 1.76 sec per loop
$ python -m timeit -s 'import sieve' -n 10 'sieve.soe_list2(3000000)'
10 loops, best of 3: 2.02 sec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list(10000)'
500 loops, best of 3: 5.19 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_byte(10000)'
500 loops, best of 3: 5.34 msec per loop
$ python -m timeit -s 'import sieve' -n 500 'sieve.soe_list2(10000)'
500 loops, best of 3: 5.16 msec per loop
10k 3M
byte (01) 5.34 ms 1.76 s
list (01) 5.19 ms 2.05 s
list (TF) 5.16 ms 2.02 s
Here there was the same ordering with the list being better for small N, and the bytearray for large N, but the list with True and False was not significantly different than the list with 1 and 0.
Memory Use
The memory use was exactly the same in python 2.7 and 3.3. I used sys.getsizeof on the list or bytearray, which was the same size at the beginning and end of the algorithm.
>>> import sieve
>>> sieve.verbose = True
>>> x = sieve.soe_list(10000)
soe_list, 10000 numbers, size = 83120
>>> x = sieve.soe_byte(10000)
soe_byte, 10000 numbers, size = 10993
>>> x = sieve.soe_list2(10000)
soe_list2, 10000 numbers, size = 83120
>>> x = sieve.soe_list(3000000)
soe_list, 3000000 numbers, size = 26791776
>>> x = sieve.soe_byte(3000000)
soe_byte, 3000000 numbers, size = 3138289
>>> x = sieve.soe_list2(3000000)
soe_list2, 3000000 numbers, size = 26791776
10k 3M
byte (01) ~11k ~3.1M
list (01) ~83k ~27M
list (TF) ~83k ~27M
I was a bit surprised~ that the large bytearray used more memory than the large list given that the large bytearray was faster.
EDIT: Oops, as pointed out in the comments, I read my own values wrong and interpreted 27M as 2.7M. The list is really much bigger.
The Question
Can anyone explain why this algorithm runs faster using a list for small N, and faster using a bytearray for large N?
Test code for reference
sieve.py:
import sys
if sys.version_info.major == 3:
xrange = range
verbose = False
def soe_byte(upper):
numbers = bytearray(0 for _ in xrange(0,upper+1))
if verbose:
print("soe_byte, {} numbers, size = {}".format(upper, sys.getsizeof(numbers)))
primes = []
cur = 2
while cur <= upper:
if numbers[cur] == 1:
cur += 1
continue
primes.append(cur)
for i in xrange(cur,upper+1,cur):
numbers[i] = 1
return primes
def soe_list(upper):
numbers = list(0 for _ in xrange(0,upper+1))
if verbose:
print("soe_list, {} numbers, size = {}".format(upper, sys.getsizeof(numbers)))
primes = []
cur = 2
while cur <= upper:
if numbers[cur] == 1:
cur += 1
continue
primes.append(cur)
for i in xrange(cur,upper+1,cur):
numbers[i] = 1
return primes
def soe_list2(upper):
numbers = list(False for _ in xrange(0,upper+1))
if verbose:
print("soe_list2, {} numbers, size = {}".format(upper, sys.getsizeof(numbers)))
primes = []
cur = 2
while cur <= upper:
if numbers[cur] == True:
cur += 1
continue
primes.append(cur)
for i in xrange(cur,upper+1,cur):
numbers[i] = True
return primes