I have a dataframe containing repeated measurements of a number of variables for a large number of samples. I would like to look at each variable / sample combination, compare the measurement dates with the dates in another column, and only keep the row where the minimum date difference exists.
start_date <- as.Date('2021-01-01')  
end_date <- as.Date('2021-12-30') 
set.seed(1984)
# The dataframe looks something like this
cheese <- data.frame(sample_id = c('1','1','1','1','2','2','2','2','1','1','1','1','2','2','2','2'),
                 variable = c('a','a','b','b','a','a','b','b',
                          'a','a','b','b','a','a','b','b'),
             value = runif(n = 16, min = 1, max = 10),
             measurement_date = c('2021-05-04','2021-08-22','2021-05-04','2021-08-22',
                                  '2021-05-04','2021-08-22','2021-05-04','2021-08-22',
                                  '2021-05-04','2021-08-22','2021-05-04','2021-08-22',
                                  '2021-05-04','2021-08-22','2021-05-04','2021-08-22'),
             date2 = as.Date(sample( as.numeric(start_date): as.numeric(end_date), 16, 
                                     replace = T), 
                             origin = '1970-01-01'))
And I'd like it to end up like this:
sample_id   variable   measurement_date    date2
1           a          2021-05-04          2021-06-08
1           b          2021-05-04          2021-03-21
2           a          2021-05-04          2021-01-27
2           b          2021-05-04          2021-03-15
1           a          2021-08-22          2021-09-10
1           b          2021-08-22          2021-03-22
2           a          2021-08-22          2021-11-27
2           b          2021-08-22          2021-08-13
I know that the answer falls somewhere in the realm of this similar question, but my example has a different structure that I can't seem to wrap my head around. I've tried using dplyr's filter option, but my solution only returns a single row where the smallest difference across all samples occurs. It doesn't do this for every sample / variable combination.
library(dplyr)
filtered <- cheese %>% filter(abs(difftime(measurement_date,date2)) == min(abs(difftime(measurement_date,date2))))
