Here is my original data frame:
df <- read.table(text="
  Date         Index  Event
  2014-03-31   A      x
  2014-03-31   A      x
  2014-03-31   A      y
  2014-04-01   A      y
  2014-04-01   A      x
  2014-04-01   B      x
  2014-04-02   B      x
  2014-04-03   A      x
  2014-09-30   B      x", header = T, stringsAsFactors = F)
date_range <- seq(as.Date(min(df$Date)), as.Date(max(df$Date)), 'days')
indices <- unique(df$Index)
events_table <- unique(df$Event)
I want my desired output to summarise my dataframe and have a unique record for each index in indices and each date in date_range while providing a cumulative value of each event in events_table in a new column for all dates prior to the value in the Date column. Sometimes there are no records for each index or every date.
Here is my desired output:
Date        Index  cumsum(Event = x) cumsum(Event = y)
2014-03-31  A      0                 0
2014-03-31  B      0                 0
2014-04-01  A      2                 1
2014-04-01  B      0                 0
2014-04-02  A      3                 2
2014-04-02  B      1                 0
...  
2014-09-29  A      4                 2
2014-09-29  B      2                 0
2014-09-30  A      4                 2
2014-09-30  B      2                 0
FYI -- this is a simplified version of the data frame. There are ~200,000 records per year with hundreds of different Index fields for each Date.
I've done this in the past before my hard drive fried using by and maybe aggregate, but the process was very slow and I'm not able to get it worked out this time around. I've also tried ddply, but I'm not able to get the cumsum function to work with it. Using ddply, I tried something like:
ddply(xo1, .(Date,Index), summarise, 
      sum.x = sum(Event == 'x'), 
      sum.y = sum(Event == 'y'))
to no avail.
Through searching, I've found  Replicating an Excel SUMIFS formula
which gets me the cumulative part of my project, but with this I wasn't able to figure out how to summarize it down to only one record per date/index combo. I also came across sum/aggregate data based on dates, R but here I wasn't able to work out the dynamic date aspect.
Thanks for anyone that can help!
 
     
     
    