Update subset of data.table based on join using data.table 1.9.3 does not work any more

Question

I'm sorry to ask a new question, but I am not allowed leave comments and the post as answer was deleted.

Trying to reproduce the example from Update subset of data.table based on join gives a different result (not expected) using 1.9.3 under R 3.1.0. I know that 1.9.3 is unstable, but 1.9.2 is not working for me because of other problems - see http://r.789695.n4.nabble.com/Change-in-list-behavior-inside-join-td4687469.html

Citation from dnlbrky here

I have two data tables, DT1 and DT2:

set.seed(1)
DT1<-data.table(id1=rep(1:3,2),id2=sample(letters,6), v1=rnorm(6), key="id2")
DT1
##    id1 id2         v1
## 1:   2   e  0.7383247
## 2:   1   g  1.5952808
## 3:   2   j  0.3295078
## 4:   3   n -0.8204684
## 5:   3   s  0.5757814
## 6:   1   u  0.4874291

DT2<-data.table(id2=c("n","u"), v1=0, key="id2")
DT2
##    id2 v1
## 1:   n  0
## 2:   u  0

I would like to update DT1 based on a join with DT2, but only for a subset of DT1. For example, for DT1[id1==3], I would expect the value of v1 in row 4 to be updated as in the following result:

DT1
##    id1 id2         v1
## 1:   2   e  0.7383247
## 2:   1   g  1.5952808
## 3:   2   j  0.3295078
## 4:   3   n          0
## 5:   3   s  0.5757814
## 6:   1   u  0.4874291

I know how to update a table (using the := assignment operator), how to join the tables (DT1[DT2]), and how to subset a table (DT1[id1==3]). However I'm not sure how to do all three at once.

EDIT: Note that the original example only attempts to update one column, but my actual data requires updating many columns. Consider the additional scenarios in DT1b and DT2b:

set.seed(2)
DT1b<-DT1[,v2:=rnorm(6)] # Copy DT1 and add a new column
setkey(DT1b,id2)
DT1b
##    id1 id2         v1          v2
## 1:   2   e  0.7383247 -0.89691455
## 2:   1   g  1.5952808  0.18484918
## 3:   2   j  0.3295078  1.58784533
## 4:   3   n -0.8204684 -1.13037567
## 5:   3   s  0.5757814 -0.08025176
## 6:   1   u  0.4874291  0.13242028

DT2b<-rbindlist(list(DT2,data.table(id2="e",v1=0))) # Copy DT2 and add a new row
DT2b[,v2:=-1] # Add a new column to DT2b
setkey(DT2b,id2)
DT2b
##    id2 v1 v2
## 1:   e  0 -1
## 2:   n  0 -1
## 3:   u  0 -1

Based on the helpful answers from @nmel and @BlueMagister, I came up with this solution for the updated scenario:

DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0],c("v1","v2"):=list(i.v1,i.v2)]
DT1b
##    id1 id2         v1          v2
## 1:   2   e  0.0000000 -1.00000000
## 2:   1   g  1.5952808  0.18484918
## 3:   2   j  0.3295078  1.58784533
## 4:   3   n -0.8204684 -1.13037567
## 5:   3   s  0.5757814 -0.08025176
## 6:   1   u  0.0000000 -1.00000000

End Citation

This is reproducible using data.table 1.8.10, but it is not working any more using data.table 1.9.3 @ R 3.1.0. The result is instead:

DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0],c("v1","v2"):=list(i.v1,i.v2)]
DT1b
    id1 id2         v1          v2
## 1:   2   e  0.7383247 -0.89691455
## 2:   1   g  1.5952808  0.18484918
## 3:   2   j  0.3295078  1.58784533
## 4:   3   n -0.8204684 -1.13037567
## 5:   3   s  0.5757814 -0.08025176
## 6:   1   u  0.4874291  0.13242028

due to

DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0], list(i.v1)]
##   id1 id2        v1         v2 i.v1 i.v2 i.id1      i.v1       i.v2
## 1:   2   e 0.7383247 -0.8969145    0   -1     2 0.7383247 -0.8969145
## 2:   1   u 0.4874291  0.1324203    0   -1     1 0.4874291  0.1324203

Does anybody have a suggestion, how to fix it?

nachti

EDIT:

If I fix the code due to 1.9.3 (see @arun), it's not working in 1.8.11 any more. That's a problem, because it's implemented in a funcion in a package used with different versions of R and data.table. The change of the defaults in 1.9.3 breaks existing code, which shoud not be (see. DT FAQ 1.8). Would be fine if there is a possibility that code works with different versions of DT.

I think you should read the [NEWS](https://r-forge.r-project.org/scm/viewvc.php/pkg/NEWS?view=markup&root=datatable) to see the changes in 1.9.3. There's no *implicit by-without-by* anymore. Try: `DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0],c("v1","v2"):=list(i.v1,i.v2), by=.EACHI]` — Arun, Apr 25 '14 at 10:30
Thanks Arun! Your code is working well! I read the NEWS of course. The problem is just to find out all occurrences of such merges in my code, which I should fix, that it's working with 1.9.3 ... — nachti, Apr 25 '14 at 11:05
Ok. I'll switch to [datatable-help](http://r.789695.n4.nabble.com/datatable-help-f2315188.html). — nachti, May 07 '14 at 08:14

Update subset of data.table based on join using data.table 1.9.3 does not work any more

0 Answers0