 I have a data frame X which will always have zeros to start with and ends with zeroes so I am performing the .diff() function on the sun column to get the difference of the current interval with its previous interval and when I do that I get this big values at the start of the day and at the end of the day marked in yellow color in data frame Y, I am trying to see how to calculate the difference from the 3:30 time stamp so that we get a data frame z where we have zero instead of 100 and -142
I have a data frame X which will always have zeros to start with and ends with zeroes so I am performing the .diff() function on the sun column to get the difference of the current interval with its previous interval and when I do that I get this big values at the start of the day and at the end of the day marked in yellow color in data frame Y, I am trying to see how to calculate the difference from the 3:30 time stamp so that we get a data frame z where we have zero instead of 100 and -142
            Asked
            
        
        
            Active
            
        
            Viewed 301 times
        
    0
            
            
         
    
    
        Krish
        
- 67
- 5
- 
                    It would be helpful if you added a sample dataframe to this quesetion. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Scott Boston Aug 15 '20 at 23:46
- 
                    @ScottBoston my bad just added it, – Krish Aug 15 '20 at 23:48
- 
                    are they any zeroes or could there been any zeroes inside your valid data range? – Scott Boston Aug 16 '20 at 00:06
- 
                    There wont be zeroes – Krish Aug 16 '20 at 01:25
1 Answers
1
            If no zeroes in valid data range:
df.loc[~df['sun'].eq(0), 'sun'].diff().fillna(0).reindex(df.index, fill_value=0)
Output:
2020-07-20 03:05:00     0.0
2020-07-20 03:10:00     0.0
2020-07-20 03:15:00     0.0
2020-07-20 03:20:00     0.0
2020-07-20 03:25:00     0.0
2020-07-20 03:30:00    21.0
2020-07-20 03:35:00     1.0
2020-07-20 03:40:00    12.0
2020-07-20 03:45:00   -12.0
2020-07-20 03:50:00    20.0
2020-07-20 03:55:00     0.0
2020-07-20 04:00:00     0.0
2020-07-20 04:05:00     0.0
Freq: 5T, Name: sun, dtype: float64
Otherwise lets find the start and end of valid data range:
s = df.where(df['sun'].ne(0))
idx_start = s.first_valid_index()
idx_end = s.last_valid_index()
df.loc[idx_start:idx_end].diff().fillna(0).reindex(df.index, fill_value=0)
Output:
                      sun
2020-07-20 03:05:00   0.0
2020-07-20 03:10:00   0.0
2020-07-20 03:15:00   0.0
2020-07-20 03:20:00   0.0
2020-07-20 03:25:00   0.0
2020-07-20 03:30:00  21.0
2020-07-20 03:35:00   1.0
2020-07-20 03:40:00  12.0
2020-07-20 03:45:00 -12.0
2020-07-20 03:50:00  20.0
2020-07-20 03:55:00   0.0
2020-07-20 04:00:00   0.0
2020-07-20 04:05:00   0.0
 
    
    
        Scott Boston
        
- 147,308
- 15
- 139
- 187