In a grouped data frame, I want to generate a new column for the size (number of observations) of each group.
Whereas this is easy enough with dplyr functions, I struggle to find how to do it using tools from collapse package.
Let's take the mpg dataset from ggplot2. Once we group the data by manufacturer, we then want to compute an additional column that accounts for whether each row belongs to a group with of size > 10. With dplyr it's as easy as mutate(more_than_10_obs = n() > 10).
library(ggplot2) ## for the `mpg` dataset
library(dplyr, warn.conflicts = FALSE)
mpg %>%
select(manufacturer) %>%
group_by(manufacturer) %>%
mutate(more_than_10_obs = n() > 10)
#> # A tibble: 234 x 2
#> # Groups: manufacturer [15]
#> manufacturer more_than_10_obs
#> <chr> <lgl>
#> 1 audi TRUE
#> 2 audi TRUE
#> 3 audi TRUE
#> 4 audi TRUE
#> 5 audi TRUE
#> 6 audi TRUE
#> 7 audi TRUE
#> 8 audi TRUE
#> 9 audi TRUE
#> 10 audi TRUE
#> # ... with 224 more rows
Created on 2021-08-24 by the reprex package (v2.0.0)
But I have very large data and I want to speed up processing time, therefore use tools from collapse package. So following this answer, I've tried to first generate a new column that simply gives the number of observations in the group that each row belongs to.
library(collapse)
library(ggplot2)
library(magrittr) ## for pipe operator
mpg %>%
fselect(manufacturer) %>%
fgroup_by(manufacturer) %>%
ftransform(idx = seq_row(.)) %>%
ftransform(n_in_group = fNobs(idx))
#> # A tibble: 234 x 3
#> manufacturer idx n_in_group
#> * <chr> <int> <int>
#> 1 audi 1 234
#> 2 audi 2 234
#> 3 audi 3 234
#> 4 audi 4 234
#> 5 audi 5 234
#> 6 audi 6 234
#> 7 audi 7 234
#> 8 audi 8 234
#> 9 audi 9 234
#> 10 audi 10 234
#> # ... with 224 more rows
#>
#> Grouped by: manufacturer [15 | 16 (11)]
Created on 2021-08-24 by the reprex package (v2.0.0)
My attempt is unsuccessful because column n_in_group gives the total number of rows in mpg, rather than the size of each group.
How can I utilize collapse tools to achieve the same output as in my dplyr-based code shown at the beginning?