Here a MWE of my problem.
Data:
library(data.table)
#dates in %Y-%m-%d    
    df <- data.table(date=as.Date(c("2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02", "2001-01-02")), dtm=c(18L,  18L, 18L,  18L,  18L, 18L,46L,46L,74L, 74L,74L,74L,165L, 165L,165L,165L), cval=c(1275L, 1300L, 1300L, 1320L, 1325L, 1325L, 1300L, 1300L, 1300L, 1300L, 1325L, 1325L, 1300L, 1300L, 1325L, 1325L), price_in=c(24.125, 24.625, 35.750, 16.250, 14.500, 50.250, 43.625, 49.125, 58.250, 58.250, 45.375, 70.125, 90.750, 74.750, 77.875, 85.500), price_out=c(26.125, 26.625, 36.625, 17.500, 15.500, 52.250, 45.625, 51.125, 60.000, 60.250, 47.375, 72.125, 92.750, 76.750, 79.875, 87.500), type=c("P", "C", "P", "C", "C", "P", "C", "P", "C", "P", "C", "P", "C", "P", "C", "P"))
    df
          date dtm cval price_in price_out type
 1: 2001-01-02  18 1275   24.125    26.125    P
 2: 2001-01-02  18 1300   24.625    26.625    C
 3: 2001-01-02  18 1300   35.750    36.625    P
 4: 2001-01-02  18 1320   16.250    17.500    C
 5: 2001-01-02  18 1325   14.500    15.500    C
 6: 2001-01-02  18 1325   50.250    52.250    P
 7: 2001-01-02  46 1300   43.625    45.625    C
 8: 2001-01-02  46 1300   49.125    51.125    P
 9: 2001-01-02  74 1300   58.250    60.000    C
10: 2001-01-02  74 1300   58.250    60.250    P
11: 2001-01-02  74 1325   45.375    47.375    C
12: 2001-01-02  74 1325   70.125    72.125    P
13: 2001-01-02 165 1300   90.750    92.750    C
14: 2001-01-02 165 1300   74.750    76.750    P
15: 2001-01-02 165 1325   77.875    79.875    C
16: 2001-01-02 165 1325   85.500    87.500    P
What I want to do:
- For each date, I want to obtain all items, divided in type P's andC's that have the samedtmbut a largercval. For the second item in the example data set, this would be:
date       dtm cval price_in price_out type
 2001-01-02  18 1300   24.625    26.625    C  #the item
 2001-01-02  18 1320   16.250    17.500    C  #same dtm, higher cval
 2001-01-02  18 1325   14.500    15.500    C  #same dtm, higher cval
- Now, let cval1be thecvalof the current item, i.e. herecval1 = 1300andcval2the largercval's of the items in this subset, i.e. herecval2 = c(1320L, 1325L). Then, I want to apply a custom exclusion function, for example let's sayprice_in[cval %in% cval2]-price_out[cval==cval1]-0.5*(cval1-cval2) < 0
- I then want to exclude all item pairs whose exclusion function returned TRUE.
Similarly (same procedure, different exclusion criteria) applies to the P items. 
Expected Output: The original data.table, df, minus the rows that were excluded in the procedure just described above. For example, using the example function above evaluating items 2 and 4 returns TRUE: 16.25-26.625-0.5*(1300-1320) = -0.375 < 0. Thus, the expected output would be df without rows 2 and 4 (note that the pair 2 and 5 does not return TRUE: 14.5-26.625-0.5*(1300-1325) = 0.375 >= 0, hence 5 is not excluded):
          date dtm cval price_in price_out type
 1: 2001-01-02  18 1275   24.125    26.125    P
 3: 2001-01-02  18 1300   35.750    36.625    P
 5: 2001-01-02  18 1325   14.500    15.500    C
 6: 2001-01-02  18 1325   50.250    52.250    P
 7: 2001-01-02  46 1300   43.625    45.625    C
 8: 2001-01-02  46 1300   49.125    51.125    P
          ...      ...       ...       ...     
and so on. Obviously, as in the case of items 7 and 8, if there is no other item with the same characteristics (same date, dtm and type), it cannot be excluded.
What I have tried so far:
- I have created an id for each item, i.e. df[,id:=seq_along(date)], then iterated viafor loopthrough the dates and used vectors to check my custom functions. If the result vector containedTRUE's, I removed the corresponding indices from my data.table. Clearly, this approach works but runs almost forever given the size of my data.
- I am currently experimenting with data.table methods, because of their extreme speed advantages. If I understand it correctly, what I want are many rolling self-joins for each date/dtmsubset, something along the lines"df[df,roll=Inf,by=.(date,dtm)]"(since a total, rolling self-join, I believe, is not applicable in this case). But I don't quite get it to work.
Question: Is there a way of implementing this exclusion procedure via data.table methods? Possibly (but not necessarily) via multiple rolling self-joins?
Any help would be highly appreciated!
