I am a slightly novice R user, and I am wondering how to automate finding matching rows (e.g., inner join merge), from one "master" data.table across multiple, large (meaning ~8 GB files) data.tables, and to take those matches and place them in one final table.
The "master" data.table has two columns and looks like this:
pid    pkid
123    8975A
456    9683
While the multiple, larger data.tables look like this:
pid    pkid   pha
123    8975A  3
456    9683   2
789    1253   3
I have my code set up to read all of the special delimited files from one directory, set the key for the data.table merge, and to make names from the files:
temp = list.files(pattern="*.txt")
list2env(
  lapply(setNames(temp, make.names(gsub("*.txt$", "", temp))), 
         fread,sep="|",setkey(packageid,patientid)), envir = .GlobalEnv)
Where I'm stuck is how to automate the numerous (40 something?) merges from the one "master" data.table and to dump all of the matches from the many merges into a separate data.table. Basically, I want all the cases in the larger data.tables that have matching pid and pkid to the "master" to get the corresponding pha.
The only way that I've come up with is tragically manual, and will take a long time to write out, and is undoubtedly highly inefficient.
res1<-data.table::merge(master,dt1)
res2<-data.table::merge(master,dt2)
l=list(res1,res2)
final<-rbindlist(l)
Any thoughts? Thanks!
