Lets say I have the following dataframe:
df = pd.DataFrame({'a':[1,1.1,1.03,3,3.1], 'b':[10,11,12,13,14]})
df
      a   b
0  1.00  10
1  1.10  11
2  1.03  12
3  3.00  13
4  3.10  14
And I want to group nearby points, eg.
df.groupby(#SOMETHING).mean():
          a     b
a                
0  1.043333  11.0
1  3.050000  13.5
Now, I could use
#SOMETHING = pd.cut(df.a, np.arange(0, 5, 2), labels=False)
But only if I know the boundaries beforehand. How can I accomplish similar behavior if I don't know where to place the cuts? ie. I want to group nearby points (with nearby being defined as within some epsilon).
I know this isn't trivial because point x might be near point y, and point y might be near point z, but point x might be too far z; so then its ambiguous what to do--this is kind of a k-means problem, but I'm wondering if pandas has any tools built in to make this easy.
Use case: I have several processes that generate data on regular intervals, but they're not quite synced up, so the timestamps are close, but not identical, and I want to aggregate their data.