(pandas version 0.16.0, numpy version 1.9.2)
I'm trying to bin values in a column and find the rows in the original data corresponding to the max values of each bin.
I found a way to accomplish this, and the approach is working on some float sample data, but not on int data:
>>> from pandas import *
>>> df1 = DataFrame({"id": range(3),"a": np.random.random(3)})
>>> df2 = DataFrame({"id": range(3),"a": [0,1,5]})
>>> bins = [0,1,2]
>>> grouped1 = df1.a.groupby(cut(df1.a,bins))
>>> grouped2 = df2.a.groupby(cut(df2.a,bins))
>>> idx1 = grouped1.transform(max) == df1.a
>>> df1[idx1]
           a  id
0  0.997843  0
>>> idx2 = grouped2.transform(max) == df2.a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/pandas/core/groupby.py", line 2418, in transform
    return self._transform_fast(cyfunc)
  File "/usr/lib/python2.7/site-packages/pandas/core/groupby.py", line 2459, in _transform_fast
    return self._set_result_index_ordered(Series(values))
  File "/usr/lib/python2.7/site-packages/pandas/core/groupby.py", line 493, in _set_result_index_ordered
    index = Index(np.concatenate([ indices[v] for v in self.grouper.result_index ]))
KeyError: '(1, 2]'
Note that both groups get a NaN row with these bins:
>>> grouped1.max()
a
(0, 1]    0.859684
(1, 2]         NaN
Name: a, dtype: float64
>>> grouped2.max()
a
(0, 1]     1
(1, 2]   NaN
Name: a, dtype: float64
I'm having trouble understanding what the problem is. The KeyError with a bin value doesn't make much sense to me.