I am having difficulty figuring out the best way to subset a large data set in Pandas, however my experience with pandas is limited.
My data consists of 4 columns and ~400 million lines. The columns represent datetime, X position, Y position, and then some information about the point. My goal is to subset the DataFrame by X,Y pairs which have a value of 'Stuff' that exceeds a threshold value at least once. To be clear, I want all instances of the X,Y pairs identified, not just those which exceed the threshold. Obviously this needs to be done in a memory efficient manner. I should mention that the data are currently indexed by date (although the dates are not unique).
As an example, if this is my data:
In [6]: data
Out[6]: 
            Stuff   X  Y
Date                    
2004-01-01    2.3  10  3
2004-01-02    5.6  20  2
2004-01-02   14.9  30  4
2004-01-02   10.2  10  3
2004-01-03    3.1  30  4
2004-01-04    2.9  10  5
2004-01-05    3.4  20  4
And I have a threshold of 10, I would expect to get the following as a result since the pairs 10,3 and 30,4 both have an instance of 'Stuff' that exceeds the threshold:
In [7]: data
Out[7]: 
            Stuff   X  Y
Date                    
2004-01-01    2.3  10  3
2004-01-02   14.9  30  4
2004-01-02   10.2  10  3
2004-01-03    3.1  30  4
Any tips would be appreciated. I suspect I may need to change the index to MultiIndex on X,Y, but I'm not sure. I've also looked into using groupby(['X','Y']).max but I don't believe I can subset my original dataframe using these results.
QUESTION UPDATE The solution presented by @jsexauer was what I wanted, however I am suffering from a memory error due to the large size of the data. Any recommendations on how to optimize this process?
 
     
    