I'm sorry to ask a new question, but I am not allowed leave comments and the post as answer was deleted.
Trying to reproduce the example from Update subset of data.table based on join gives a different result (not expected) using 1.9.3 under R 3.1.0. I know that 1.9.3 is unstable, but 1.9.2 is not working for me because of other problems - see http://r.789695.n4.nabble.com/Change-in-list-behavior-inside-join-td4687469.html
I have two data tables, DT1 and DT2:
set.seed(1)
DT1<-data.table(id1=rep(1:3,2),id2=sample(letters,6), v1=rnorm(6), key="id2")
DT1
## id1 id2 v1
## 1: 2 e 0.7383247
## 2: 1 g 1.5952808
## 3: 2 j 0.3295078
## 4: 3 n -0.8204684
## 5: 3 s 0.5757814
## 6: 1 u 0.4874291
DT2<-data.table(id2=c("n","u"), v1=0, key="id2")
DT2
## id2 v1
## 1: n 0
## 2: u 0
I would like to update DT1 based on a join with DT2, but only for a subset of DT1. For example, for DT1[id1==3], I would expect the value of v1 in row 4 to be updated as in the following result:
DT1
## id1 id2 v1
## 1: 2 e 0.7383247
## 2: 1 g 1.5952808
## 3: 2 j 0.3295078
## 4: 3 n 0
## 5: 3 s 0.5757814
## 6: 1 u 0.4874291
I know how to update a table (using the := assignment operator), how to join the tables (DT1[DT2]), and how to subset a table (DT1[id1==3]). However I'm not sure how to do all three at once.
EDIT: Note that the original example only attempts to update one column, but my actual data requires updating many columns. Consider the additional scenarios in DT1b and DT2b:
set.seed(2)
DT1b<-DT1[,v2:=rnorm(6)] # Copy DT1 and add a new column
setkey(DT1b,id2)
DT1b
## id1 id2 v1 v2
## 1: 2 e 0.7383247 -0.89691455
## 2: 1 g 1.5952808 0.18484918
## 3: 2 j 0.3295078 1.58784533
## 4: 3 n -0.8204684 -1.13037567
## 5: 3 s 0.5757814 -0.08025176
## 6: 1 u 0.4874291 0.13242028
DT2b<-rbindlist(list(DT2,data.table(id2="e",v1=0))) # Copy DT2 and add a new row
DT2b[,v2:=-1] # Add a new column to DT2b
setkey(DT2b,id2)
DT2b
## id2 v1 v2
## 1: e 0 -1
## 2: n 0 -1
## 3: u 0 -1
Based on the helpful answers from @nmel and @BlueMagister, I came up with this solution for the updated scenario:
DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0],c("v1","v2"):=list(i.v1,i.v2)]
DT1b
## id1 id2 v1 v2
## 1: 2 e 0.0000000 -1.00000000
## 2: 1 g 1.5952808 0.18484918
## 3: 2 j 0.3295078 1.58784533
## 4: 3 n -0.8204684 -1.13037567
## 5: 3 s 0.5757814 -0.08025176
## 6: 1 u 0.0000000 -1.00000000
End Citation
This is reproducible using data.table 1.8.10, but it is not working any more using data.table 1.9.3 @ R 3.1.0. The result is instead:
DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0],c("v1","v2"):=list(i.v1,i.v2)]
DT1b
id1 id2 v1 v2
## 1: 2 e 0.7383247 -0.89691455
## 2: 1 g 1.5952808 0.18484918
## 3: 2 j 0.3295078 1.58784533
## 4: 3 n -0.8204684 -1.13037567
## 5: 3 s 0.5757814 -0.08025176
## 6: 1 u 0.4874291 0.13242028
due to
DT1b[DT2b[DT1b[id1 %in% c(1,2)],nomatch=0], list(i.v1)]
## id1 id2 v1 v2 i.v1 i.v2 i.id1 i.v1 i.v2
## 1: 2 e 0.7383247 -0.8969145 0 -1 2 0.7383247 -0.8969145
## 2: 1 u 0.4874291 0.1324203 0 -1 1 0.4874291 0.1324203
Does anybody have a suggestion, how to fix it?
nachti
EDIT:
If I fix the code due to 1.9.3 (see @arun), it's not working in 1.8.11 any more. That's a problem, because it's implemented in a funcion in a package used with different versions of R and data.table. The change of the defaults in 1.9.3 breaks existing code, which shoud not be (see. DT FAQ 1.8). Would be fine if there is a possibility that code works with different versions of DT.