I'm struggling with this task for quite some time therefore I thought I'd ask your help.
In df1 I'm trying to add a new column based on both information in this df1 as well as information in df2. So in df2 a column with ID should be created whenever the locations in both dfs match and the time stamp in df2 is within the interval given by df1, otherwise return 0. The issue is that they are of unequal length. I know how to write a nested for loop, but it's ugly and takes forever to run. I tried using sapply as proposed as a solution to a similar issue, but it won't run due to dissimilar df length
I found this thread [Speed up the loop operation in R, but because my dfs on which the conditionals act are of different lengths, I cant get this solution to work.
This is my data:
df1 <- structure(list(ID = c(NA, NA, 10035010L), location = c("barge", 
"barge", "barge"), start = structure(c(NA, NA, 
1427301960), class = c("POSIXct", "POSIXt"), tzone = ""), end = structure(c(NA, 
NA, 1437418440), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("ID", 
"location", "start", "end"), row.names = c(NA, 3L), class = "data.frame")
df2<-structure(list(time = structure(c(1419062220, 1419063120, 1427325120, 
1427325240, 1427325360, 1427325540, 1427325660, 1427326680, 1427568960, 
1427569320, 1427569500), class = c("POSIXct", "POSIXt"), tzone = ""), 
    location = c("barge", "barge", "barge", 
    "barge", "barge", "barge", "barge", 
    "barge", "barge", "barge", "barge"
    )), row.names = c(222195L, 222196L, 186883L, 186884L, 186885L, 
186886L, 186887L, 186888L, 186930L, 186931L, 186932L), class = "data.frame", .Names = c("time", 
"location"))
UPDATE: I decided to go with the dplyr package as I feel comfortable working with it, and used it on my larger dataset. However, a problem arises because the output is not consistent across locations when I include station ID.
Consider the same but slightly modified datasets that include station to see the difference in results:
df3<-structure(list(time = structure(c(1419061860, 1419062220, 1419063120, 
1427325120, 1427325240, 1427325360, 1427325540, 1427325660, 1427326680, 
1427568960, 1427569320), class = c("POSIXct", "POSIXt"), tzone = ""), 
    station = c(104667L, 104667L, 104667L, 124083L, 124083L, 
    124083L, 124083L, 124083L, 124083L, 124083L, 124083L), location = c("barge", 
    "barge", "barge", "barge", "barge", 
    "barge", "barge", "barge", "barge", 
    "barge", "barge")), row.names = 879:889, class = "data.frame", .Names = c("time", "station", "location"))
and
df4<-structure(list(station = c(124083L, 113071L), location = c("barge", 
"barge"), ID = c(10035010L, NA), start = structure(c(1427301960, 
NA), class = c("POSIXct", "POSIXt"), tzone = ""), end = structure(c(1437418440, 
NA), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = 3:4, class = "data.frame", .Names = c("station", 
"location", "ID", "start", "end"))
when I run the dplyr solution,
df3 %>% left_join(., df4) %>%
  mutate(ID = ifelse(time >= start & time < end, ID, 0))
it doesn't return the same output i.e. in the first case the dataset returned is a multiple of the original data, in the last case the dataset returned is of equal length. I just can't figure out why it's different. It makes using the filter() function impossible. Any suggestions how to tackle this problem would be highly appreciated. Thanks
 
     
     
     
    