I am trying to get the standard deviation for one column in a data frame, grouped by several other columns.
x <- c("Paul", "Paul", "Paul", "Jennifer", "Jennifer", "Jennifer")
y <- c("a", "a", "b", "c", "c", "d")
g <- c("eins", "eins", "zwei", "drei", "drei", "vier")
z <- c(1,2,3,4,5,6)
df <- tibble(Fall = x, DRG = y, DRG2 = g, Anzahl = z)
df$Fall <- as.factor(df$Fall)
df$DRG <- as.factor(df$DRG)
df$DRG2 <- as.factor(df$DRG2)
This is the tibble:
df
# A tibble: 6 x 4
  Fall     DRG   DRG2  Anzahl
  <fct>    <fct> <fct>  <dbl>
1 Paul     a     eins       1
2 Paul     a     eins       2
3 Paul     b     zwei       3
4 Jennifer c     drei       4
5 Jennifer c     drei       5
6 Jennifer d     vier       6
Calculating the mean works:
aggregate(x = df, 
          by = list(df$Fall, df$DRG, df$DRG2),
          FUN = mean, na.rm = TRUE)
   Group.1 Group.2 Group.3 Fall DRG DRG2 Anzahl
1 Jennifer       c    drei   NA  NA   NA    4.5
2     Paul       a    eins   NA  NA   NA    1.5
3 Jennifer       d    vier   NA  NA   NA    6.0
4     Paul       b    zwei   NA  NA   NA    3.0
Standard deviation gives me an error:
aggregate(x = df, 
          by = list(df$Fall, df$DRG, df$DRG2),
          FUN = sd, na.rm = TRUE)
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : 
  Calling var(x) on a factor x is defunct.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
Why is that? I tried to understand the error message but i don't understand why it works with mean but not with standard deviation. If i turn all the factors to characters, then standard deviation works and gives me correct result. Why is that?
Regards
 
     
    