R noob here, working in tidyverse / RStudio.
I have a categorical / factor variable that I'd like to retain in a group_by/summarize workflow. I'd like to summarize it using a summary function that returns the most common value of that factor within each group.
Is there a summary function I can use for this?
mean returns NA, median only works with numeric data, and summary gives me separate rows with counts of each factor level instead of the most common level.
Edit: example using subset of mtcars dataset:
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
21 6 160 110 3.9 2.62 16.5 0 1 4 4
21 6 160 110 3.9 2.88 17.0 0 1 4 4
22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
24.4 4 147. 62 3.69 3.19 20 1 0 4 2
22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
Here I have converted carb into a factor variable. In this subset of the data, you can see that among 6-cylinder cars there are 3 with carb=4 and 1 with carb=1; similarly among 4-cylinder cars there are 2 with carb=2 and 1 with carb=1.
So if I do:
data %>% group_by(cyl) %>% summarise(modalcarb = FUNC(carb))
where FUNC is the function I'm looking for, I should get:
cyl carb
<dbl> <fct>
4 2
6 4
8 2 # there are multiple potential ways of handling multi-modal situations, but that's secondary here
Hope that makes sense!