I know I can use the plyr and its friends to combine dataframes, and merge as well, but so far I don't know how to merge two dataframes with multiple columns based on 2 columns?
            Asked
            
        
        
            Active
            
        
            Viewed 4.3e+01k times
        
    144
            
            
         
    
    
        Brian Tompsett - 汤莱恩
        
- 5,753
- 72
- 57
- 129
 
    
    
        Sam
        
- 7,922
- 16
- 47
- 62
3 Answers
175
            See the documentation on ?merge, which states:
By default the data frames are merged on the columns with names they both have, 
 but separate specifications of the columns can be given by by.x and by.y.
This clearly implies that merge will merge data frames based on more than one column. From the final example given in the documentation:
x <- data.frame(k1=c(NA,NA,3,4,5), k2=c(1,NA,NA,4,5), data=1:5)
y <- data.frame(k1=c(NA,2,NA,4,5), k2=c(NA,NA,3,4,5), data=1:5)
merge(x, y, by=c("k1","k2")) # NA's match
This example was meant to demonstrate the use of incomparables, but it illustrates merging using multiple columns as well. You can also specify separate columns in each of x and y using by.x and by.y.
 
    
    
        joran
        
- 169,992
- 32
- 429
- 468
- 
                    2@darkage This question deals with merging data frames. Looks like you have data.tables. Totally different. I would read the documentation for data.table. – joran May 13 '14 at 21:47
76
            
            
        Hope this helps;
df1 = data.frame(CustomerId=c(1:10),
             Hobby = c(rep("sing", 4), rep("pingpong", 3), rep("hiking", 3)),
             Product=c(rep("Toaster",3),rep("Phone", 2), rep("Radio",3), rep("Stereo", 2)))
df2 = data.frame(CustomerId=c(2,4,6, 8, 10),State=c(rep("Alabama",2),rep("Ohio",1),   rep("Cal", 2)),
             like=c("sing", 'hiking', "pingpong", 'hiking', "sing"))
df3 = merge(df1, df2, by.x=c("CustomerId", "Hobby"), by.y=c("CustomerId", "like"))
Assuming df1$Hobby and df2$like mean the same thing.
 
    
    
        Ɖiamond ǤeezeƦ
        
- 3,223
- 3
- 28
- 40
 
    
    
        Hyunbong Lee
        
- 826
- 6
- 4
17
            
            
        You can also use the join command (dplyr).
For example:
new_dataset <- dataset1 %>% right_join(dataset2, by=c("column1","column2"))
- 
                    6For those who want to merge data frames and keep only the matching records by the columns specified, use `inner_join` instead `right_join`. – crwang Jan 03 '19 at 14:17
 
    