Assigning values from one data table to the other with a join

Question

I have a largish master data.table and different processes each modifying different subsets of it. Then i need to update the master table with the changes made by each process.

This is an example:

# Master table
dtA <- data.table(month=month.abb[1:5],act=letters[1:5],col1=(1:5)^2, col2=(5:9)^3)

# "subsidiary" table    
dtB <- data.table(month=month.abb[3:4],act=letters[3:4],col1=(6:7)^3)

setkey(dtA, month,act)
setkey(dtB, month,act)

dtA
   month act col1 col2
1:   Apr   d   16  512
2:   Feb   b    4  216
3:   Jan   a    1  125
4:   Mar   c    9  343
5:   May   e   25  729

dtB
   month act col1
1:   Apr   d  343
2:   Mar   c  216

# The result I want is:
   month act col1 col2
1:   Apr   d  343  512
2:   Feb   b    4  216
3:   Jan   a    1  125
4:   Mar   c  216  343
5:   May   e   25  729

Almost by chance and with my great surprise, as I haven't seen this documented elsewhere, I found this incredibly simple way to get there:

dtA[dtB, col1:=i.col1]

My question: is this a legitimate way to solve my problem and what would be the syntax to do this dynamically, e.g. if I have many columns to set at the same time into my master table?

Yes, this is exactly the way to do this. If you want to do add more columns you would do something like ````dtA[dtB, `:=`(col1 = i.col1, col2 = i.col2,..., coln = i.coln)]````. I'm wondering how did you discover this by chance? — David Arenburg, Feb 10 '15 at 12:39
@david I may have missed it, but I couldn't find any reference to i.columns in data.table docs as a result of a join (but of course it is used elsewhere in R e.g. for merge). It is just one of the many things I tried and (the only one that) worked! — Enzo, Feb 10 '15 at 13:01
@eddi I didn't see the question you are referring to. Even after hours of searching so and any data.table document I have (FAQ, cheat sheet, intro, data.table main doc in CRAN) I couldn't find an answer. Why the i.col notation is so little documented and used in examples?? — Enzo, Feb 10 '15 at 16:35
@Enzo that's a good question, and I originally came upon that notation by accident. If you have specific suggestions on how it can be incorporated into any of those docs you should post them on github. — eddi, Feb 10 '15 at 17:32

score 0 · Answer 1 · edited Apr 18 '20 at 02:24

0

One approach to dynamically update multiple columns is to use assign by reference and select your columns from either data.table that you want updating using indices. This will save you from having to type out assignments for all column names.

The example here includes an additional column in dtA to demonstrate unequal tables:

dtA <- data.table(month=month.abb[1:5],act=letters[1:5],col1=(1:5)^2, col2=(8:12)^2, col3=(13:17)^2)
dtB <- data.table(month=month.abb[3:4],act=letters[3:4],col1=(6:7)^3, col2=(13:14)^3)

setkey(dtA, month,act)
setkey(dtB, month,act)

dtA
   month act col1 col2 col3
1:   Apr   d  343 2744  256
2:   Feb   b    4   81  196
3:   Jan   a    1   64  169
4:   Mar   c  216 2197  225
5:   May   e   25  144  289

dtB
   month act col1 col2
1:   Apr   d  343 2744
2:   Mar   c  216 2197

# columns 3:4 are selected for updating
dtA[dtB[,list(month,act)], 3:4 := (dtB[, 3:4])]


dtA
   month act col1 col2 col3
1:   Apr   d  343 2744  256
2:   Feb   b    4   81  196
3:   Jan   a    1   64  169
4:   Mar   c  216 2197  225
5:   May   e   25  144  289

edited Apr 18 '20 at 02:24

MichaelChirico

33,841
14
113
198

answered Feb 10 '15 at 12:59

MattLBeck

5,701
7
40
56

that looks nice, but I though the mantra was to always assign by reference? – Enzo Feb 10 '15 at 13:04
Actually, your method is "risky" if dtA has more data columns than dtB: they will get NA. I will edit the original question to cater for this use case. – Enzo Feb 10 '15 at 13:18
@Enzo see the updated answer for an alternative assignment by reference solution. Yes, I assumed that `dtA` would have the same columns as `dtB`, since `dtB` is derived from `dtA`. – MattLBeck Feb 10 '15 at 13:21
dtA as the master table has actually more columns that dtB: my apologies, I simplified my example too much! – Enzo Feb 10 '15 at 13:22
@Enzo I think the second solution might solve that second problem anyway, seeing as you are specifying a particular range. It's not that different to what was suggested in the comments, but you at least don't have to assign each column separately. – MattLBeck Feb 10 '15 at 13:23
I would love to know why my answer was worth a downvote. It's a valid working solution as far as I can tell. – MattLBeck Feb 10 '15 at 16:55
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/70651/discussion-between-enzo-and-mattrition). – Enzo Feb 10 '15 at 17:25

Assigning values from one data table to the other with a join

1 Answers1