Good morning,
I'm running a block of code and its taking a bit too long. The goal is to make a "moving average" of the last X number of days. In this case, the previous 2, 3, 4, and 5 days of scores. It needs to be from the last date Date, for a unique id EID. Here it the data:
df:
     EID            Date        Score
     1111         5/25/2015        10
     1111         5/29/2015         6
     1111         6/17/2015         9
    12345         5/27/2015        10
    12345          1/1/2015         8
    12345          1/7/2015         9
    12345          1/9/2015        10
    12345         1/10/2015         7
    19611         1/13/2015         8
    19611         1/21/2015        10
    19611         1/23/2015         9
    19611         1/24/2015        10
    19611         1/30/2015         5
    19611          2/5/2015         6
    19611         2/11/2015        10
    19611         2/12/2015         7
    19611         2/14/2015        10
    19611         2/15/2015         6
    19611         2/18/2015        10
    19611         2/19/2015        10
This is what I'm currently running on 500,000+ rows across 6 data frames
uniqueID <- unique(df$EID)
rowNr    <- lapply(uniqueID,function(uniqueID){which(df$EID==uniqueID)})
lastDate <- lapply(rowNr,function(n){df$Date[rev(n)[1]]})
Avg      <- lapply(rowNr,function(n){mean(df$Score[n])})
prev2    <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),3),2)])})
prev3    <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),4),3)])})
prev4    <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),5),4)])})
prev5    <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),6),5)])})
Scores <- data.frame(EID       = uniqueID,                       
                     avg_score = unlist(Avg),                       
                     score2    = unlist(prev2),                       
                     score3    = unlist(prev3),                       
                     score4    = unlist(prev4),                       
                     score5    = unlist(prev5))
Here are the results
View(Scores)
EID     avg_score   score2  score3  score4  score5
1111    8.33        7.50    8.33    NA      NA
12345   8.80        8.50    8.67    8.50    8.80
19611   8.42        10.00   8.67    9.00    8.60
Any ideas to make this run faster? Im currently running this on 6 data frames and it takes 10-15 minutes for each one to process. How can this be optimized to run faster?
Thanks!
 
    