I have a dataframe x which has two columns. Let's say they are A and B columns. A column are dates, B column are values that taken on that date. Both A and B column are continuous, no missing dates, and range from Jan to Oct in days. Right now I have another dataframe y which has column A and column C. A are also dates, C are values taken by another instrument on that specific date. However, A and C in this dataframe is not continuous (there are missing dates). For example, I have value on 1/1 but next value will be on 1/10. Now I would like to join values of C in dataframe y (discrete dates) to the first dataframe x which has continuos dates, according to the common A column, is there a efficient way to do this?
            Asked
            
        
        
            Active
            
        
            Viewed 161 times
        
    1 Answers
4
            You could use merge from base R
 res1 <- merge(x,y, by="A", all.x=TRUE)
 dim(res1)
 #[1] 10  3
 head(res1,3)
 #          A          B         C
 #1 2011-04-03  1.3709584 0.2101654
 #2 2011-04-04 -0.5646982        NA
 #3 2011-04-05  0.3631284        NA
or left_join from dplyr (should be fast on big datasets)
 library(dplyr)
 res2 <- left_join(x,y, by="A") %>% #in case `date` column gets coerced to `numeric`
                        mutate(A= as.Date(A, origin='1970-01-01'))
 all.equal(res1, as.data.frame(res2))
 #[1] TRUE
Data
  set.seed(42)
  x <- data.frame(A=seq(as.Date("2011-04-03"), length.out=10, by=1), B=rnorm(10))
  set.seed(384)
  y <- data.frame(A=seq(as.Date("2011-04-03"), length.out=6, by=3), C=rnorm(6))
 
    
    
        akrun
        
- 874,273
- 37
- 540
- 662
- 
                    Thank you very much! It works! But instread of using `All.x=TRUE`, I used `ALL=TURE`. – Xin Qiao Oct 07 '14 at 19:20
