It's because the groupby object handles the bookkeeping and the grouper objects just reference their key and the parent groupby object:
typedef struct {
PyObject_HEAD
PyObject *it; /* iterator over the input sequence */
PyObject *keyfunc; /* the second argument for the groupby function */
PyObject *tgtkey; /* the key for the current "grouper" */
PyObject *currkey; /* the key for the current "item" of the iterator*/
PyObject *currvalue; /* the plain value of the current "item" */
} groupbyobject;
typedef struct {
PyObject_HEAD
PyObject *parent; /* the groupby object */
PyObject *tgtkey; /* the key value for this grouper object. */
} _grouperobject;
Since you're not iterating the grouper object when you unpack the groupby object I'll ignore them for now. So what's interesting is what happens in the groupby when you call next on it:
static PyObject *
groupby_next(groupbyobject *gbo)
{
PyObject *newvalue, *newkey, *r, *grouper;
/* skip to next iteration group */
for (;;) {
if (gbo->currkey == NULL)
/* pass */;
else if (gbo->tgtkey == NULL)
break;
else {
int rcmp;
rcmp = PyObject_RichCompareBool(gbo->tgtkey, gbo->currkey, Py_EQ);
if (rcmp == 0)
break;
}
newvalue = PyIter_Next(gbo->it);
if (newvalue == NULL)
return NULL; /* just return NULL, no invalidation of attributes */
newkey = PyObject_CallFunctionObjArgs(gbo->keyfunc, newvalue, NULL);
gbo->currkey = newkey;
gbo->currvalue = newvalue;
}
gbo->tgtkey = gbo->currkey;
grouper = _grouper_create(gbo, gbo->tgtkey);
r = PyTuple_Pack(2, gbo->currkey, grouper);
return r;
}
I removed all the irrelevant exception handling code and removed or simplified pure reference counting stuff. The interesting thing here is that when you reach the end of the iterator the gbo->currkey, gbo->currvalue and gbo->tgtkey aren't set to NULL, they will still point to the last encountered values (the last item of the iterator) because it just return NULL when PyIter_Next(gbo->it) == NULL.
After this finished you have your two grouper objects. The first one will have a tgtvalue of False and the second with True. Let's have a look what happens when you call next on these groupers:
static PyObject *
_grouper_next(_grouperobject *igo)
{
groupbyobject *gbo = (groupbyobject *)igo->parent;
PyObject *newvalue, *newkey, *r;
int rcmp;
if (gbo->currvalue == NULL) {
/* removed because irrelevant. */
}
rcmp = PyObject_RichCompareBool(igo->tgtkey, gbo->currkey, Py_EQ);
if (rcmp <= 0)
/* got any error or current group is end */
return NULL;
r = gbo->currvalue; /* this accesses the last value of the groupby object */
gbo->currvalue = NULL;
gbo->currkey = NULL;
return r;
}
So remember currvalue is not NULL, so the first if branch isn't interesting. For your first grouper it compares the tgtkey of the grouper and the groupby object and sees that they differ and it will immediatly return NULL. So you got an empty list.
For the second iterator the tgtkeys are identical, so it will return the currvalue of the groupby object (which is the last encountered value in the iterator!), but this time it will set the currvalue and currkey of the groupby object to NULL.
Switching back to python: The really interesting quirks happen if you have a grouper with the same tgtkey as the last group in your groupby:
import itertools
>>> inputs = [(x > 5, x) for x in range(10)] + [(False, 10)]
>>> (_, g1), (_, g2), (_, g3) = itertools.groupby(inputs, key=lambda x: x[0])
>>> list(g1)
[(False, 10)]
>>> list(g3)
[]
That one element in g1 didn't belong to the first group at all - but because the tgtkey of the first grouper object is False and the last tgtkey is False the first grouper thought it belongs into the first group. It also invalidated the groupby object so the third group is now empty.
All the code was taken from the Python source code but shortened.