dplyr and length does not group_by

Question

df%>%
    group_by(variable1)%>%
    summarise(length=length(levels(df$variable2))

group_by does not work and I have the same results for all the levels of the variable1.

See also [here](http://stackoverflow.com/questions/1195826/drop-factor-levels-in-a-subsetted-data-frame) — David Arenburg, Jan 24 '16 at 13:16

akrun · Accepted Answer · 2016-01-24T13:09:45.887

5

We need to remove df$. The levels(df$variable2) gets the levels in the full dataset. For factor variables, the unused levels remains unless we drop the levels with droplevels.

df %>%
   group_by(variable1)%>%
   summarise(length=length(levels(droplevels(variable2))))

Also, instead of using the levels route, we can use n_distinct

 df %>% 
   group_by(variable1) %>% 
   summarise(length=n_distinct(variable2))

data

set.seed(24)
df <- data.frame(variable1=sample(letters[1:3], 
   10,replace=TRUE), variable2= sample(letters[1:5],10, replace=TRUE))

edited Jan 24 '16 at 13:09

answered Jan 24 '16 at 12:47

akrun

874,273
37
540
662

1

@DavidArenburg I think the question was straightforward when we consider the general behavior of `levels` in a `factor`. – akrun Jan 24 '16 at 13:19

dplyr and length does not group_by

1 Answers1

data