Is there an established module, or good practice, to work efficiently with large object pools in Python 3?
What I mean by "object pool" is some class capable of:
- fetching new instances of specified type, while dynamically extending the memory allocation under the hood when necessary;
- maintaining a consistent indexing for previously fetched objects.
Here is a basic example:
class Value:
__slots__ = ('a','b')
def __init__(self,a=None,b=None):
self.a = a
self.b = b
class BasicPool:
def __init__(self):
self.data = []
def __getitem__(self,k):
return self.data[k]
def fetch(self):
v = Value()
self.data.append(v)
return v
class BlockPool:
def __init__(self,bsize=100):
self.bsize = bsize
self.next = bsize
self.data = []
def __getitem__(self,k):
b,k = divmod(k,self.bsize)
return self.data[b][k]
def fetch(self):
self.next += 1
if self.next >= self.bsize:
self.data.append([ Value() for _ in range(self.bsize) ])
self.next = 0
return self.data[-1][self.next]
The BasicPool doesn't do anything smart: whenever a new instance is requested, it is instanciated and appended to an underlying list. On the other hand, the BlockPool grows a list of pre-allocated blocks of instances. Surprisingly though, it seems that preallocation is not beneficial in practice:
from timeit import default_timer as timer
def benchmark(P):
N = int(1e6)
start = timer()
for _ in range(N): P.fetch()
print( timer() - start )
print( 'Basic pool:' )
for _ in range(5): benchmark(BasicPool())
# Basic pool:
# 1.2352294209995307
# 0.5003506309985823
# 0.48115064000012353
# 0.48508202800076106
# 1.1760561199989752
print( 'Block pool:' )
for _ in range(5): benchmark(BlockPool())
# Block pool:
# 0.7272855400005938
# 1.4875716509995982
# 0.726611527003115
# 0.7369502859983186
# 1.4867010340021807
As you can see, the BasicPool is always faster than the BlockPool (I also don't know the cause of these large variations). Pools of objects must be a fairly common need in Python; is the best approach really to use the builtin list.append? Are there smarter containers that can be used to further improve runtime performance, or is this dominated by the instanciation time anyway?