Here are some examples of determining and imputing the mode in a pipe.
Functions to calculate mode:
library(tidyverse)
# Single mode (returns only the first mode if there's more than one)
# https://stackoverflow.com/a/8189441/496488
# Modified to remove NA
Mode <- function(x) {
ux <- na.omit(unique(x))
ux[which.max(tabulate(match(x, ux)))]
}
# Return all modes if there's more than one
# https://stackoverflow.com/a/8189441/496488
# Modified to remove NA
Modes <- function(x) {
ux <- na.omit(unique(x))
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
Apply the functions to a data frame:
iris %>%
summarise(across(everything(), Mode))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5 3 1.4 0.2 setosa
iris %>% map(Modes)
#> $Sepal.Length
#> [1] 5
#>
#> $Sepal.Width
#> [1] 3
#>
#> $Petal.Length
#> [1] 1.4 1.5
#>
#> $Petal.Width
#> [1] 0.2
#>
#> $Species
#> [1] setosa versicolor virginica
#> Levels: setosa versicolor virginica
Impute missing data using the mode. But note that we use Mode, which returns only the first mode in cases where there are multiple modes. You may need to adjust your method if you have multiple modes.
# Create missing data
d = iris
d[1, ] = rep(NA, ncol(iris))
head(d)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 NA NA NA NA <NA>
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
# Replace missing values with the mode
d = d %>%
mutate(across(everything(), ~coalesce(., Mode(.))))
head(d)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.0 3.0 1.5 0.2 versicolor
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa