I have a list of data frames
df1 <- data.frame(Symbol = c("A", "A", "B", "C", "D", "D", "D", "E", "E", "A"),
                     Sample1 = sample(50, 10),
                     Sample2 = sample(50, 10),
                     Sample3 = sample(50, 10))
df2 <- data.frame(Symbol = c("B", "B", "B", "E", "A"),
                  Sample4 = sample(50, 5),
                  Sample5 = sample(50, 5),
                  Sample6 = sample(50, 5))
df3 <- data.frame(Symbol = c("A", "B", "C", "D", "E", "F"),
                  Sample7 = sample(50, 6),
                  Sample8 = sample(50, 6),
                  Sample9 = sample(50, 6))
As you can see, each data frame has a column called Symbol, which contains values that appear in multiple data frames in the list. I'd like to be able to consolidate them so I can select some basic stats (e.g. mean, 25th percentile, etc...). Initially, I simply did this
dfList <- list(df1, df2, df3)
df <- reduce(dfList, merge, "Symbol", all = T)
thinking that I'd just take the mean of each row. However, I didn't know that symbols were appearing multiple times in each data frame. So the above line actually consolidated each occurrence of say A in the first data frame, with each occurrence of A in the second and third data frames. This was not what I wanted.
The messy solution is to just get the unique Symbol identifiers from the entire list, and write an lapply over them in which I search for each identifier in each data frame, and extract the values. I'll then have a list of vectors, with the names being the Symbol identifiers. Something like this
syms <- unique(as.character(unlist(lapply(dfList, function(x) return(x$Symbol)))))
lst <- lapply(syms, function(x) {
  return(unlist(lapply(dfList, function(y) {
    return(unlist(y[grep(x, y$Symbol),-1]))
  })))
})
names(lst) <- syms
Then I can just take the mean, median, etc...
This is very slow on my actual data frame, and I just think there must be an easier way to do this.
Edit: So in the end, I'd like to have somethign like this:
Mean = t(as.data.frame(lapply(lst, function(x) {
  return(mean(x))
})))
Min = t(as.data.frame(lapply(lst, function(x) {
  return(min(x))
})))
Quantiles = t(as.data.frame(lapply(lst, function(x) {
  return(quantile(x, c(0.25, 0.5, 0.75)))
})))
Max = t(as.data.frame(lapply(lst, function(x) {
  return(max(x))
})))
df <- cbind(Mean, Min, Quantiles, Max)
colnames(df) <- c("Mean", "Min", "p25", "p50", "p75", "Max")
This will produce
> df
      Mean Min   p25  p50   p75 Max
A 27.53333  13 21.50 27.0 32.50  48
B 24.33333   2 11.00 21.0 37.50  48
C 26.50000   3 24.00 26.5 32.75  45
D 25.00000   2 12.25 23.5 38.75  49
E 23.25000   2 12.75 22.0 33.75  45
F 21.00000  11 11.50 12.0 26.00  40
 
     
     
     
    