Okay, I have a pretty heavy lifting problem for a loop in R that is taxing my knowledge. Any advice or direction is appreciated.
I have a list of dates and phone numbers. The phone number is being used as the index and can appear for multiple dates.
    library(dplyr)
    Date = as.Date(c("2019-03-01", "2019-03-15","2019-03-29",      "2019-04-10","2019-03-05","2019-03-20"))
    Phone = c("070000001","070000001","070000001","070000001","070000002","070000002")
    df<-data.frame(Date,Phone)
    df
    ##         Date     Phone
    ## 1 2019-03-01 070000001
    ## 2 2019-03-15 070000001
    ## 3 2019-03-29 070000001
    ## 4 2019-04-10 070000001
    ## 5 2019-03-05 070000002
## 6 2019-03-20 070000002
I then computed the difference between dates for each interaction
    df<-df %>%
      arrange(Phone,Date) %>%
      group_by(Phone) %>%
      mutate(Diff = Date - lag(Date))
    df
    ##   Date       Phone     Diff   
    ##   <date>     <fct>     <drtn> 
    ## 1 2019-03-01 070000001 NA days
    ## 2 2019-03-15 070000001 14 days
    ## 3 2019-03-29 070000001 14 days
    ## 4 2019-04-10 070000001 12 days
    ## 5 2019-03-05 070000002 NA days
    ## 6 2019-03-20 070000002 15 days
Now comes the hard part. I am trying to create a dynamic column range that counts the number of rows from the earliest date/phone pair until the sum of the data difference is =< 30. Then a subsequent column is created for the next incident. Once a new number begins the columns start back at 1. So it becomes an record on incidents in which multiple rows may have membership in each instance.
I have tried several iterations of for and while loop combinations but they were so terrible that I just deleted them. I also tried this as a matrix function but the logic was far to complicated for the query and even when going over a small sample set my computer wouldn't like it.
This is as far as I got.
current.sum <- 0
for (c in 1:nrow(df)) {
  current.sum <- current.sum + df[c, "Diff"]
  df[c, "Diff"] <- current.sum
  if (current.sum <= 30) {
    df[c, "Int"] <- nrow(current.sum)
    current.sum <- NA()
  }
}
The desired dataset would look like this.There are essentially five steps:
- For each phone number
 - Sum Diff until it reaches 30
 - Count the number of rows
 - Create a column and place the row count next to the included rows
 - Restart (either for the next phone number or for a subsequent sum up to 30)
 
    ##         Date     Phone  Diff Int_1 Int_2
    ## 1 2019-03-01 070000001    NA     3    NA
    ## 2 2019-03-15 070000001    14     3    NA
    ## 3 2019-03-29 070000001    14     3     2
    ## 4 2019-04-10 070000001    12    NA     2
    ## 5 2019-03-05 070000002    NA     1    NA
    ## 6 2019-03-20 070000002    15     1    NA
Update 18/07/2019
I have been able to get 70% of the way, but am still missing the transposed columns and dual membership of date/phone combinations. I might be able to do with the transpose I can add this the solution below a way to evaluate the last group date to see if it should be included in membership rather than being exclusive to the 30 days threshold.
     cumsum_group <- function(x, threshold) {
      cumsum <- 0
      group <- 1
      result <- numeric()
      for (i in 1:length(x)) {
        cumsum <- cumsum + x[i]
        if (cumsum > threshold) {
          group <- group + 1
          cumsum <- x[i]
        }
        result = c(result, group)
      }
      return (result)
    }
    df<-df %>% group_by(binning=cumsum_group(Diff, 30))
The issue is that line 3 and 4 both belong to instance 2 of 070000001 the previous entry when added together equals 26. So the function needs to both lag to the previous value once the threshold is reached AND have a new column that allows that to be flagged.