Say I have two data table dm and dn:
library(data.table)
set.seed(12)
dates = seq.Date(as.Date('2015-09-01'),as.Date('2015-11-01'), 2)
dm = data.table(user=sample(LETTERS[1:4], 10, replace=T),
time=sample(dates, 10))
dn = data.table(user=sample(LETTERS[1:8], 3, replace=F),
start=c(as.Date('2015-09-01'), as.Date('2015-10-05'),
as.Date('2015-09-14')),
end=c(as.Date('2015-10-30'), as.Date('2015-11-01'),
as.Date('2015-10-20')))
>dm
# user time
# 1: A 2015-09-25
# 2: D 2015-10-19
# 3: D 2015-09-21
# 4: B 2015-10-27
# 5: A 2015-09-15
# 6: A 2015-09-23
# 7: A 2015-10-21
# 8: C 2015-10-31
# 9: A 2015-10-01
# 10: A 2015-09-05
>dn
# user start end
# 1: B 2015-09-01 2015-10-30
# 2: F 2015-10-05 2015-11-01
# 3: A 2015-09-14 2015-10-20
How can one do to subset dm based on the columns of dn? For example, for each user in dn, we look up dm for the matched user and subset the rows having time falling between the user's time interval [start, end], if there is any.
In this example, the desired outcome is
user time start end
5: A 2015-09-15 2015-09-14 2015-10-20
6: A 2015-09-23 2015-09-14 2015-10-20
9: A 2015-10-01 2015-09-14 2015-10-20
10: A 2015-09-05 2015-09-14 2015-10-20
4: B 2015-10-27 2015-09-01 2015-10-30
The row number is retained just for illustration, and the order of time doesn't matter.