While performing a diff function how to only perform when the data is not zero or not to consider the first and last value after a zero

Question

I have a data frame X which will always have zeros to start with and ends with zeroes so I am performing the .diff() function on the sun column to get the difference of the current interval with its previous interval and when I do that I get this big values at the start of the day and at the end of the day marked in yellow color in data frame Y, I am trying to see how to calculate the difference from the 3:30 time stamp so that we get a data frame z where we have zero instead of 100 and -142

It would be helpful if you added a sample dataframe to this quesetion. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Scott Boston, Aug 15 '20 at 23:46
are they any zeroes or could there been any zeroes inside your valid data range? — Scott Boston, Aug 16 '20 at 00:06

Scott Boston · Accepted Answer · 2020-08-16T00:16:16.740

If no zeroes in valid data range:

df.loc[~df['sun'].eq(0), 'sun'].diff().fillna(0).reindex(df.index, fill_value=0)

Output:

2020-07-20 03:05:00     0.0
2020-07-20 03:10:00     0.0
2020-07-20 03:15:00     0.0
2020-07-20 03:20:00     0.0
2020-07-20 03:25:00     0.0
2020-07-20 03:30:00    21.0
2020-07-20 03:35:00     1.0
2020-07-20 03:40:00    12.0
2020-07-20 03:45:00   -12.0
2020-07-20 03:50:00    20.0
2020-07-20 03:55:00     0.0
2020-07-20 04:00:00     0.0
2020-07-20 04:05:00     0.0
Freq: 5T, Name: sun, dtype: float64

Otherwise lets find the start and end of valid data range:

s = df.where(df['sun'].ne(0))
idx_start = s.first_valid_index()
idx_end = s.last_valid_index()
df.loc[idx_start:idx_end].diff().fillna(0).reindex(df.index, fill_value=0)

Output:

                      sun
2020-07-20 03:05:00   0.0
2020-07-20 03:10:00   0.0
2020-07-20 03:15:00   0.0
2020-07-20 03:20:00   0.0
2020-07-20 03:25:00   0.0
2020-07-20 03:30:00  21.0
2020-07-20 03:35:00   1.0
2020-07-20 03:40:00  12.0
2020-07-20 03:45:00 -12.0
2020-07-20 03:50:00  20.0
2020-07-20 03:55:00   0.0
2020-07-20 04:00:00   0.0
2020-07-20 04:05:00   0.0

While performing a diff function how to only perform when the data is not zero or not to consider the first and last value after a zero

1 Answers1

If no zeroes in valid data range:

Otherwise lets find the start and end of valid data range: