Recently I'm writing a download program, which uses the HTTP Range field to download many blocks at the same time. I wrote a Python class to represent the Range (the HTTP header's Range is a closed interval):
class ClosedRange:
    def __init__(self, begin, end):
        self.begin = begin
        self.end = end
    def __iter__(self):
        yield self.begin
        yield self.end
    def __str__(self):
        return '[{0.begin}, {0.end}]'.format(self)
    def __len__(self):
        return self.end - self.begin + 1
The __iter__ magic method is to support the tuple unpacking:
header = {'Range': 'bytes={}-{}'.format(*the_range)}
And len(the_range) is how many bytes in that Range.
Now I found that 'bytes={}-{}'.format(*the_range) occasionally causes the MemoryError. After some debugging I found that the CPython interpreter will try to call len(iterable) when executing func(*iterable), and (may) allocate memory based on the length. On my machine, when len(the_range) is greater than 1GB, the MemoryError appears.
This is a simplified one:
class C:
    def __iter__(self):
        yield 5
    def __len__(self):
        print('__len__ called')
        return 1024**3
def f(*args):
    return args
>>> c = C()
>>> f(*c)
__len__ called
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> # BTW, `list(the_range)` have the same problem.
>>> list(c)
__len__ called
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
So my questions are:
Why CPython call
len(iterable)? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?Can
__len__method return the 'fake' length (i.e. not the real number of elements in memory) of an object?