I have a data frame with a by variable, and multiple variables to aggregate, but by different functions.
d <- data.frame(year=c(rep(2011,5), rep(2012,5)),
            v1 = sample(1:10, 10),
            v2 = sample(1:10, 10),
            v3 = sample(1:10, 10),
            v4 = sample(1:10, 10)
            )
d
#     year v1 v2 v3 v4
# 1  2011  1  7  1  3
# 2  2011  6  3  2 10
# 3  2011  7  9  5  8
# 4  2011 10  8  6  9
# 5  2011  3  2  8  4
# 6  2012  9  5  7  6
# 7  2012  2  6  9  5
# 8  2012  4  1  4  7
# 9  2012  5  4  3  1
# 10 2012  8 10 10  2
Now, v1 and v2 need to be aggregate by sum, and v3 and v4 by mean. If these variable names are available explicitly as literals, ddply with summarize works well, as:
library(plyr)
ddply(d, "year", summarize, a1=sum(v1), a2=sum(v2), a3=mean(v3), a4=mean(v4))
#   year a1 a2  a3  a4
# 1 2011 27 29 4.4 6.8
# 2 2012 28 26 6.6 4.2
However, to me, the two lists of columns are available as vectors only. i.e.:
cols1 <- c("v1", "v2")
cols2 <- c("v3", "v4")
# cols1 and cols2 are dynamically generated at runtime.
# v1,v2,v3,v4 are not directly available.
I have tried to achieve the aggregations by these two methods, but neither works:
# ddply without summarize
ddply(d, "year", function(x) cbind(colSums(x[cols1]), colMeans(x[cols2])))
# weird output!
# ddply with summarize
ddply(d, "year", summarize, colSums(cols1), colMeans(cols2))
#Error in colSums(cols1) : 'x' must be an array of at least two dimensions
If the best way to do this does not use ddply (say aggregate, maybe), that's perfectly okay.
The best workaround I have right now is doing the two aggregations separately, and then merging the two data frames using the aggregation by-variable.
