I am working with cattle fertility data. In one table (data frame), what I have is a record of all the services performed in a cow (like inseminations). In a different table, I get the pregnancy diagnosis (positive or negative). Both have an unique ID (animal_id). My challenge has been successfully merging both tables in the right data range, meaning what I need is the pregnancy check associated with the right insemination record. Here is a sample of how both tables look like,
animal_id     service_date
610710        2005-10-22
610710        2006-12-03
610710        2006-12-27
610710        2007-12-02
610710        2008-01-17
610710        2008-03-04
The other table is the same but with a different date (event_date) and the diagnosis,
 animal_id     event_date        event_description
    610710     2006-06-16           PP
    610710     2007-02-15           PP
    610710     2008-01-09           PN
    610710     2008-04-09           PP
    610710     2009-06-16           PP
So what I would like to do is merge both tables in a way the dates complement each other, meaning if a service was performed on 2005-10-12, when I join both tables this row will link to the closest date in the Events table, and by closest I also mean later - since insemination happens before diagnosis. So the desired output would be something like this,
    animal_id    service_date       event_date     event_description
 1   610710       2005-10-22              NA               NA
 2   610710           NA              2006-06-16           PP
 3   610710       2006-12-03          2007-02-15           PP
 4   610710       2006-12-27          2007-02-15           PP
 5   610710       2007-12-02          2008-01-09           PN
 6   610710       2008-01-17          2008-04-09           PP
 7   610710       2008-03-04              NA               NA  
 8   610710           NA              2009-06-16           PP 
In the final output, I would expect a large number of records not to merge against anything, like row 1 in the example output. There was a service performed in October 2005, but the first Diagnosis I have for that cow is in June 2006 - there are probably a number of service records missing. That is unfortunately to be expected. For this example, only rows 5 and 6 make sense. For rows 3 and 4, I would consider only row 4, since that is probably the insemination that resulted into pregnancy.
Is that even possible in R?
Thank you!
 
     
    