1. dplyr/tidyr
It may be better to convert the 'wide' format to 'long' format. We could use dplyr/tidyr to get the mean. Create a 'ind' column, reshape the data to 'long' using gather, split the 'variable' column into two columns ('var1', 'var2') with extract, group by 'ind', get the mean values of the 'value' column after subsetting it based on the different logical index created (i.e. var2 < startyear, var2 >= startyear & var2 <= endyear, and var2 >endyear)
library(dplyr)
library(tidyr)
dS <- df %>%
mutate(ind=row_number()) %>%
gather(variable, value, starts_with('y')) %>%
extract(variable, c('var1', 'var2'), '([^0-9]+)([0-9]+)',
convert=TRUE) %>%
group_by(ind) %>%
summarise(before_mean= mean(value[var2 < startyear]),
within_mean = mean(value[var2 >= startyear &
var2 <= endyear]),
after_mean=mean(value[var2 >endyear])) %>%
as.data.frame()
nm1 <- paste(c('before', 'within', 'after'), 'mean', sep="_")
dS
# ind before_mean within_mean after_mean
#1 1 629.6667 44.0 65.0
#2 2 636.0000 57.2 1179.4
We can create additional columns in 'df' from the above output
df[nm1] <- dS
2. base R
We can use base R methods and without changing the format of the dataset. From the original dataset ('df'), make an index ('indx') of numeric column names, remove the non-numeric part and convert to numeric ('v1').
indx <- grep('\\d+', names(df))
v1 <- as.numeric(sub('[^0-9]+', '', names(df)[indx]))
Loop the rows of 'df' (lapply), match the 'startyear' with 'v1', use that index ('i1') to get the columns, unlist, and calculate the mean. The same can be done by matching the 'endyear' with 'v1' to get the index ('i2'). Based on 'i1', and 'i2', calculate the 'within_mean' and 'after_mean'. rbind the list elements and assign the output to new columns ('nm1') in 'df'.
df[nm1] <- do.call(rbind,lapply(1:nrow(df), function(i) {
i1 <- match(df$startyear[i], v1)
before_mean<- mean(unlist(df[i,1:(i1-1),drop=FALSE]))
i2 <- match(df$endyear[i], v1)
within_mean <- mean(unlist(df[i,i2:i1]))
after_mean <- mean(unlist(df[i,match(v1[(i2+1):length(v1)],v1)]))
data.frame(before_mean,within_mean, after_mean) }))
df[nm1]
# before_mean within_mean after_mean
#1 629.6667 44.0 65.0
#2 636.0000 57.2 1179.4