I have a data.frame in R that is a collection of returns by company by day. It is long but I can cast it to be wide. I would like to create new variables that calculates the sum of the next 7,30 and 90 daily returns for each day by company.
            Asked
            
        
        
            Active
            
        
            Viewed 297 times
        
    -5
            
            
        - 
                    3You should read [how to make a good reproducible example](http://stackoverflow.com/q/5963269/5977215) and then edit your question with data and an expected output. Otherwise your question will be down-voted and / or closed. – SymbolixAU Oct 21 '16 at 01:00
- 
                    Flat means 2-dimensional (rows and columns) as opposed to a higher dimensionality represented by a relational database. You meant long instead of flat, so I edited that bit. – Hack-R Oct 21 '16 at 01:21
2 Answers
3
            
            
        Here is a solution for the 7-day problem using dplyr and cumsum
It does rearrange the data.frame as a side-effect.
I'm borrowing the data that Hack-R supplied for their solution
df1 <- data.frame(
  company = rep(c("Ford", "Coca-cola", "Booz Allen Hamilton"),90),
  returns = rep(c(200,200,150,250,100,225),45),
  day     = NA
) 
df1$day[order(df1$company)] <- 1:90
df1 <- df1 %>%
       group_by(company) %>%
       arrange(day) %>%
       mutate(c_returns = cumsum(returns)) %>%
       ungroup()
df1 <- merge(df1 %>% mutate(day=day+6),
             df1,
             by=c('company','day'),
             suffixes=c('.0','.1'),
             all.x=T) %>%
       mutate(returns_7day = c_returns.1 - c_returns.0 + returns.0,
              returns=returns.0,
              day=day-6) %>%
       select(-c(returns.0, returns.1, c_returns.0, c_returns.1))
 
    
    
        Community
        
- 1
- 1
 
    
    
        stephematician
        
- 844
- 6
- 17
- 
                    @Hack-R I don't have enough rep yet to comment on your post, so I apologise if you took it personally that I didn't comment on the typo there. – stephematician Oct 21 '16 at 03:52
- 
                    i would use a `inner_join` if the dataset is large, it is much faster than `merge`. – stephematician Nov 13 '16 at 06:18
0
            
            
        There are packages that could facilate this, but you don't really need them. You could do this with a simple loop, or even wrap the code below into a function and just apply or sapply it.
All you do is take a running total, checking first to make sure there's sufficient data to do so.
df1 <- data.frame(
  company = rep(c("Ford", "Coca-cola", "Booz Allen Hamilton"),90),
  returns = rep(c(200,200,150,250,100,225),45),
  day     = NA
) 
df1$day[order(df1$company)] <- 1:90
df1$returns_next7  <- NA
df1$returns_next30 <- NA
df1$returns_next90 <- NA
for(c in df1$company){
  tmp <- df1[df1$company == c,]
  for(i in 1:nrow(tmp)){
    if(nrow(tmp)-i >=7){
      tmp$returns_next7[i] <- sum(tmp$returns[i:(i+6)])
    } 
  }
  df1[df1$company ==c,] <- tmp
} # Same logic for 30 and 90 days
head(df1$returns_next7)
[1] 1550 1100 1275 1600 1000 1350
 
    
    
        Hack-R
        
- 22,422
- 14
- 75
- 131
