I am aware that one can merge (join) two data.table with the merge function or the [.data.table function.  However, if I have say 10, data.tables and want to use do.call to merge them all together, is there a function that would do this? Currently I resort to do.call(cbind, ...) which works for only very special cases.
            Asked
            
        
        
            Active
            
        
            Viewed 3.9k times
        
    55
            
            
        4 Answers
67
            Not sure, but maybe (untested) :
Reduce(merge,list(DT1,DT2,DT3,...))
 
    
    
        Matt Dowle
        
- 58,872
- 22
- 166
- 224
- 
                    3The reduce solution works nice, but I don't know how to specify all=TRUE. Any idea? – sdaza Feb 26 '13 at 19:15
- 
                    6@sdaza Hm good question. All I can think currently is to create a wrapper like `mymerge = function(x,y) merge.data.table(x,y,all=TRUE)` and pass that wrapper to `Reduce` instead, i.e. `Reduce(mymerge,list(DT1,DT2,DT3,...))`. – Matt Dowle Feb 26 '13 at 19:35
- 
                    1[the all = T](https://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list) worked for me in data.table. – Henk Feb 18 '16 at 14:59
- 
                    4If your `data.table`s are unkeyed, the wrapper should look like `function(x, y) x[y, on = "merge_column_name"]` – MichaelChirico Jan 13 '17 at 21:34
39
            
            
        To flesh out the solution suggested by @Henk in comments, here is how to use this to select the all = TRUE option in a concise formulation:
MergedDT = Reduce(function(...) merge(..., all = TRUE), List_of_DTs)
 
    
    
        Michael Ohlrogge
        
- 10,559
- 5
- 48
- 76
- 
                    This method works well, but I also like to add: ```R Reduce(function(...) merge(..., all=TRUE, sort=FALSE), .)``` because allowing sort to be the default value of TRUE will rekey to the shared key columns. This can break future merges, if for example you've created a list of data.tables with the appropriate keys and you want to merge them all on any shared keys. – jFrostad Apr 17 '23 at 20:04
8
            
            
        The latest version of data.table (1.10.5) has an easy syntax for table join, which can also be chained.
 set(DT1, key)
 set(DT2, key) 
  ...
 set(DT10, key)
 DT_cmb <- DT1[DT2,][DT3,]...[DT10,]
 
    
    
        HappyCoding
        
- 5,029
- 7
- 31
- 51
- 
                    1This solution seems good, but it can be exhaustive if you have several `data.table`'s. Probably, there is an easy (and more generic) way to do it using `lapply` and `ls(pattern = '^DT')` instead. – lcgodoy Feb 07 '19 at 00:57
- 
                    @lcgodoy Yeah, I like to use `lapply()` along with `setkeyv( )` in these situations, especially since I'll already want the `data.table`s in a list to use my preferred [merge syntax](https://stackoverflow.com/a/39561045/3541976) – Michael Ohlrogge Feb 07 '19 at 21:22
- 
                    this syntax is there for ages, and chaining works for ages as well, the problem is that it does not scale, as you might have a list of data.table that varies in quantity – jangorecki Apr 10 '20 at 08:53
0
            
            
        To fresh answer from Michael Ohlrogge, here I create a function to do this task. You can inherit more arguments from merge.data.table()
mergeDTs <- function(dt_list, by = NULL, sort = FALSE) {
  Reduce(
    function(...) {
      merge(..., by = by, all = TRUE, sort = sort)
    }, dt_list)
}
 
    
    
        Shixiang Wang
        
- 2,147
- 2
- 24
- 33
 
    