R noob (still) here, working in tidyverse / RStudio.
I have a tidy dataset where each row has a date, a grouping characteristic, and a value (actual dataset more complicated but that's the core of it):
I group the data by Group for each Date, and calculate some summary stats of the Value, yielding a by-group summary for each date. For instance:
grouped <- data %>% group_by(Date, Group) %>% summarise(mean = mean(Value))
head(grouped)
# A tibble: 6 × 3
# Groups: Date [4]
Date Group mean
<date> <fct> <dbl>
1 2021-02-18 A 37.4
2 2021-02-19 B 25.5
3 2021-02-19 A 26.1
4 2021-02-22 B 34.2
5 2021-02-22 A 26.4
6 2021-02-23 B 34.2
(Note: data is below for reproducibility.)
So far so good. Now I want to take the moving average of those summary stats (mean in this case, but could be others) by Group. I tried this with zoo::rollmean:
grouped <- grouped %>%
group_by(Group) %>%
mutate(rolling = zoo::rollmean(mean, window_length, fill=NA))
But here a problem arises - ideally, the moving average should be strictly some number of days, not records, but there are some days missing for one or both groups.
What's the best way to ensure that the moving average correctly takes into account the missing days x groups, treating them as NA as needed?
(I understand from this answer that zoo::rollmean wouldn't be able to handle NA values, but zoo::rollapply should be able to.)
I have tried creating a simple calendar dataframe with the full set of dates to join the grouped data to, but that leaves the Group variable as NA as well, so the missing days x groups are still ignored by the rollmean / rollapply function.
Hope that all makes sense!
data <- structure(list(Date = structure(c(18676, 18677, 18677, 18680,
18680, 18680, 18680, 18680, 18680, 18680, 18680, 18680, 18680,
18680, 18680, 18681, 18681, 18681, 18681, 18681, 18681, 18681,
18681, 18681, 18681, 18681, 18681, 18681, 18681, 18681, 18681,
18681, 18681, 18681, 18682, 18682, 18682, 18682, 18682, 18683,
18683, 18683, 18683, 18683, 18683, 18683, 18683, 18683, 18683,
18683, 18683, 18683, 18684, 18684, 18684, 18684, 18684, 18684,
18684, 18684, 18684, 18684, 18684, 18685, 18685, 18685, 18685,
18685, 18685, 18685, 18685, 18685, 18685, 18685, 18687, 18687,
18687, 18687, 18687, 18687, 18687, 18687, 18687, 18688, 18688,
18688, 18688, 18688, 18688, 18688, 18688, 18688, 18689, 18689,
18689, 18689, 18689, 18689, 18690, 18690, 18690, 18690, 18690,
18690, 18690, 18690, 18691, 18691, 18691, 18691, 18691, 18691,
18691, 18691, 18691, 18691, 18692, 18692, 18692, 18692, 18692,
18692, 18692, 18692, 18692, 18692, 18692, 18692, 18693, 18694,
18694, 18694, 18694, 18694, 18694, 18694, 18694, 18694, 18694,
18694, 18694, 18695, 18695, 18695, 18695, 18695, 18695, 18695,
18695, 18695, 18696, 18696, 18696, 18696, 18696, 18696, 18696,
18696, 18696, 18697, 18697, 18697, 18697, 18697, 18697, 18697,
18697, 18697, 18698, 18698, 18698, 18698, 18698, 18698, 18698,
18698, 18698, 18699, 18699, 18699, 18699, 18699, 18699, 18699,
18699, 18699, 18699, 18699, 18699, 18699, 18699, 18699, 18699,
18699, 18699, 18699, 18700, 18701, 18701, 18701, 18701, 18701,
18701, 18701, 18701, 18701, 18701, 18701, 18701, 18701, 18701,
18701, 18702, 18702, 18702, 18702, 18702, 18702, 18702, 18702,
18702, 18702, 18702, 18702, 18702, 18702, 18702, 18702, 18702,
18702, 18702, 18703, 18703, 18703, 18703, 18703, 18703, 18703,
18703, 18703, 18703, 18703, 18703, 18703, 18703, 18703, 18703,
18703, 18703, 18703), class = "Date"), Group = structure(c(2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("B", "A"), class = "factor"),
Value = c(37.43, 26.13, 25.54, 31.65, 26.95, 15.29, 35.93,
28.59, 17.14, 30.42, 20.52, 33.4, 35.3, 36.87, 28.32, 21.78,
25.49, 34.13, 20.35, 40.21, 16, 24.58, 23.61, 38.94, 36.76,
29.68, 15.97, 20.79, 17.83, 14.65, 16.76, 35.74, 31.5, 25.6,
32.96, 14.1, 40.4, 24.53, 39.57, 21.38, 14.49, 22.11, 27.12,
16.46, 17.65, 37.32, 15.74, 17.07, 28.52, 14.72, 27.75, 36.69,
39.47, 26.13, 35.57, 24.08, 24.39, 13.1, 16.75, 24.49, 23.61,
15.04, 23.22, 37.3, 36.76, 15.77, 28.34, 35.06, 28.32, 29.39,
19.09, 35.68, 35.9, 37.13, 36.1, 40.55, 33.97, 24.03, 37.25,
34.39, 13.05, 21.64, 40.02, 26.17, 19.39, 25.76, 40.92, 24.21,
20.35, 27.7, 29.53, 14.19, 15.64, 32.74, 31.42, 14.01, 12.85,
17.31, 31.67, 23.63, 17.29, 36.71, 18.19, 17.78, 34.87, 36.87,
19.27, 24.97, 41.66, 16.83, 34.79, 14.94, 34.39, 40.66, 31.35,
31.74, 36.19, 18.28, 37.61, 37.19, 29.58, 17.04, 28.84, 16.6,
41.97, 32.36, 27.91, 21.7, 40.45, 35.38, 41.19, 35.68, 19.49,
20.94, 23.99, 14.28, 39.24, 12.19, 18.02, 39.14, 40.61, 33.32,
38.68, 39.18, 31.76, 22.64, 38.18, 36.75, 30.91, 38.82, 30.68,
14.2, 39.34, 18.91, 12.7, 28.2, 37.85, 34.06, 12.88, 40.03,
29.95, 14.61, 17.01, 35.64, 20.49, 39.51, 29.29, 18.84, 36.42,
37.88, 32.65, 19.7, 19.84, 38.75, 21.25, 40.68, 17.89, 26.3,
37.22, 18.03, 17.33, 36.26, 41.98, 19.4, 20.54, 18.6, 26.92,
15.23, 20.22, 15.2, 35.49, 15.14, 14.43, 30.82, 14.79, 17.74,
36.8, 17.09, 18.09, 19.92, 39.64, 23.87, 22.67, 24.66, 24.33,
16.82, 17.91, 21.66, 30.79, 32.91, 25.16, 38.98, 15.49, 21.33,
38.47, 34.46, 24.22, 36.93, 22.25, 15.33, 41.38, 34.49, 23.44,
30.53, 10.62, 23.8, 28.94, 12.49, 22, 24.51, 14.72, 15.53,
23.23, 38.93, 16.06, 19.36, 35.91, 22.2, 15.85, 33.36, 31.75,
19.69, 29.86, 16.3, 19.42, 19.17, 14.41, 13.18, 20.67, 17.02
)), row.names = c(NA, -250L), class = c("tbl_df", "tbl",
"data.frame"))