I have the following dataset:
Data:
I would like to sum, by group, the values of all events that started within 60 days (already calculated interval = Start_interval, End_interval) without adding the same row in more than one interval.
Expected output:
I did research and found some solutions that allowed me to obtain, until then, the results displayed below, very close to what I expected. For example: In R: how to sum a variable by group between two dates
The difference between the questions is that I need the sum to be performed without adding, by group, the same row in more than one period.
Result I got so far:
Does anyone have any suggestions?
Reproducible example:
# Input data
data <- data.table(id = c("Group A", "Group A", "Group A", "Group A",
                          "Group A", "Group A"),
                   start_date_event = c("2019-09-15",
                                        "2019-11-24",
                                        "2020-04-19",
                                        "2020-04-25",
                                        "2020-05-25",
                                        "2020-10-27"),
                     end_date_event = c("2019-09-24",
                                        "2019-11-28",
                                        "2020-04-23",
                                        "2020-04-29",
                                        "2020-05-27",
                                        "2020-11-06"),
                     start_interval = c("2019-09-15",
                                        "2019-11-24",
                                        "2020-04-19",
                                        "2020-04-25",
                                        "2020-05-25",
                                        "2020-10-27"),
                       end_interval = c("2019-11-14",
                                        "2020-01-23",
                                        "2020-06-18",
                                        "2020-06-24",
                                        "2020-07-24",
                                        "2020-12-26"),
                     value = c(9, 4, 4, 4, 2, 15))
# Convert to date
data <- data %>%
        dplyr::mutate(start_date_event = as.Date(start_date_event)) %>%
        dplyr::mutate(end_date_event = as.Date(end_date_event)) %>%
        dplyr::mutate(start_interval = as.Date(start_interval)) %>%
        dplyr::mutate(end_interval = as.Date(end_interval))
# Calculating with non-equi join
temp <- data[data,
          on = .(start_date_event <= end_interval,
                 end_date_event >= start_interval)][,
          .(value_sum = sum(value)),
          by = .(id, start_date_event)]
# Get all
data <- merge(data, temp, all.x = T,
              by.x = c("id", "end_interval"),
              by.y = c("id", "start_date_event"))
Thanks!



 
     
    