This question gives an example on how to convert the by.x = and by.y = arguments in base R merge to data.table syntax, to specify differently named columns as the join key:
data.table merge by multiple columns
However, I can't work out how to do the same for a non-equi join, and I am very confused by the output.
Example data:
set.seed(0)
tmp_dt1<- data.table(grp = c(1,2), time = runif(100))
tmp_dt2 <- data.table(grp = c(1,2), time = c(0.1, 0.5))
tmp_dt2 <- tmp_dt2[, time_to := time + 0.2]
tmp_dt2 <- tmp_dt2[, time_from := time] # for clarity, rename time variable
I would like to equi-join the two tables by grp, followed by a non-equi join so that I only keep time from tmp_dt1 where it falls between time_to and time_from. From what I can tell tmp_dt1[tmp_dt2, , on = c("grp", "time>=time", "time<=time_to")] does what I want:
> tmp_dt1[tmp_dt2, , on = c("grp", "time>=time", "time<=time_to")]
grp time time.1 time_from
1: 1 0.1 0.3 0.1
2: 1 0.1 0.3 0.1
3: 1 0.1 0.3 0.1
4: 1 0.1 0.3 0.1
5: 1 0.1 0.3 0.1
6: 1 0.1 0.3 0.1
7: 1 0.1 0.3 0.1
...
What confuses me is that x.time is missing, and the resulting column names are very confusing. For example, why is there a column called time.1? I would like to clarify the syntax so that tmp_dt1[tmp_dt2, , on = c("grp", "time>=y.time", "time<=y.time_to")] produces:
grp y.time y.time_to time_from
1: 1 0.1 0.3 0.1
2: 1 0.1 0.3 0.1
3: 1 0.1 0.3 0.1
4: 1 0.1 0.3 0.1
5: 1 0.1 0.3 0.1
6: 1 0.1 0.3 0.1
7: 1 0.1 0.3 0.1
...
and some how also extract the column x.time, in addition to all columns in y. Unfortunately this fails with the error:
> tmp_dt1[tmp_dt2, , on = c("grp", "time>=y.time", "time<=y.time_to")]
Error in `[.data.table`(tmp_dt1, tmp_dt2, , on = c("grp", "time>=y.time", :
Column(s) [y.time,y.time_to] not found in i
Trying the following also does not produce what I expect, instead I get:
> tmp_dt1[tmp_dt2, .(grp, time, time_from = i.time, time_to = i.time_to), on = c("grp", "time>=time", "time<=time_to")]
grp time time_from time_to
1: 1 0.1 0.1 0.3
2: 1 0.1 0.1 0.3
3: 1 0.1 0.1 0.3
4: 1 0.1 0.1 0.3
5: 1 0.1 0.1 0.3
6: 1 0.1 0.1 0.3
7: 1 0.1 0.1 0.3
where the time column bears no resemblance to tmp_dt1$time.