averaging subsets within a data frame while retaining comments

Question

Say I have a data frame, data That contains multiple sites, indicated by integer site codes. Within those sites are samples from multiple horizons, A,B and C, which have observations of some type, indicated in the column value:

site<- c(12,12,12,12,45,45,45,45)
horizon<-c('A','A','B','C','A','A','B','C')
value<- c(19,14,3,2,18,19,4,5)
comment<- c('pizza','pizza','pizza','pizza','taco','taco','taco','taco')
data<- data.frame(site,horizon,value,comment)

Which looks like this:

  site horizon value comment
1   12       A    19   pizza
2   12       A    14   pizza
3   12       B     3   pizza
4   12       C     2   pizza
5   45       A    18    taco
6   45       A    19    taco
7   45       B     4    taco
8   45       C     5    taco

In this case both sites have multiple A observations. I would like to average the values of of duplicate horizons within a site. I would like to retain the comment line within the data frame as well. All observations within a site have the same entry within the comment vector. I would like the output to look like this:

  site horizon value comment
1   12       A  16.5   pizza
3   12       B     3   pizza
4   12       C     2   pizza
5   45       A  18.5    taco
7   45       B     4    taco
8   45       C     5    taco

With dplyr, this works: `data %>% group_by(site,horizon,comment) %>% summarise_each(funs(mean))`, though you should have 16.5 not 18.5 in the first row, eh? — Frank, Nov 05 '15 at 20:34
@Frank thanks! Is there any way to have dplyr do this without specifying the comment vector. My real data set has many many comment vectors. — colin, Nov 05 '15 at 20:40
Hm, you could do `data %>% group_by_(.dots=setdiff(names(.),"value")) %>% summarise_each(funs(mean))`. Personally, I would just keep the `comment` data in a separate table if it's determined by `site`. — Frank, Nov 05 '15 at 20:42
@frank good point. I think I'm going to go that route actually. — colin, Nov 05 '15 at 20:51

jogo · Answer 1 · 2015-11-05T21:40:54.017

0

d <- read.table(header=TRUE, text=
'  site horizon value comment
1   12       A    19   pizza
2   12       A    14   pizza
3   12       B     3   pizza
4   12       C     2   pizza
5   45       A    18    taco
6   45       A    19    taco
7   45       B     4    taco
8   45       C     5    taco')
merge(aggregate(value ~ site+horizon, FUN=mean, data=d), unique(d[,-3]))

edited Nov 05 '15 at 21:40

answered Nov 05 '15 at 20:37

jogo

12,469
11
37
42

averaging subsets within a data frame while retaining comments

1 Answers1