I have a dataframe with 19M rows of different customers (~10K customers) and for their daily consumption over different date ranges. I have resampled this data into weekly consumption and the resulted dataframe is 2M rows. I want to know the ranges of consecutive dates for each customer and select those with the max(range). Any ideas? Thank you!
            Asked
            
        
        
            Active
            
        
            Viewed 51 times
        
    1 Answers
0
            
            
        It would be great if you could post some example code, so the replies will be more specific.
You probably want to do something like earliest = df.groupby('Customer_ID').min()['Consumption_date'] to get the earliest consumption date per customer, and latest = df.groupby('Customer_ID').max()['Consumption_date'] for the latest consumption date, and then take the difference time_span = latest-earliest to get the time span per customer.
Knowing the specific df and variable names would be great
 
    
    
        KingOtto
        
- 840
- 5
- 18
- 
                    hello and thanks for your relpy! In this way i can get the range of the data for each customer, but i dont know if the dates betwenn the min and max date (as specified by your code) will be consecutive! – dogo Feb 16 '22 at 15:56
- 
                    they will not be consecutive - simply the min and the max... Maybe you can update your question a bit, add a data example, and show the desired output. This way it will be much simpler to advise. A general piece of advice: `pd.diff()` allows you to take differences.. I imagine you need some form of grouping by customer, then sorting by date, then taking diff, then selecting all with diff == 1 (consecutive), then do something with it. but again, your question is not 100% clear to me – KingOtto Feb 16 '22 at 16:29
