using pipes for unique() function

Question

Below is the code i used to do a mode imputation for the column status group of the dataset tan1. How do I rewrite the same using pipes? the unique() function does not seem to work in pipes.

NA_stat <- unique(tan1$status_group[!is.na(tan1$status_group)])

mode <- NA_stat[which.max(tabulate(match(tan1$status_group, NA_stat)))]

tan1$status_group[is.na(tan1$status_group)] <- mode

Also, how do I apply this same process for multiple columns?

Hard to know without knowing your data, maybe `dplyr::distinct()` could be useful, however would be great if you share with us `dput(tan1)` — AlSub, Feb 05 '21 at 18:28

eipi10 · Accepted Answer · 2021-02-05T18:49:46.130

Here are some examples of determining and imputing the mode in a pipe.

Functions to calculate mode:

library(tidyverse)

# Single mode (returns only the first mode if there's more than one)
# https://stackoverflow.com/a/8189441/496488
# Modified to remove NA
Mode <- function(x) {
  ux <- na.omit(unique(x))
  ux[which.max(tabulate(match(x, ux)))]
}

# Return all modes if there's more than one
# https://stackoverflow.com/a/8189441/496488
# Modified to remove NA
Modes <- function(x) {
  ux <- na.omit(unique(x))
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

Apply the functions to a data frame:

iris %>% 
  summarise(across(everything(), Mode))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1            5           3          1.4         0.2  setosa

iris %>% map(Modes)
#> $Sepal.Length
#> [1] 5
#> 
#> $Sepal.Width
#> [1] 3
#> 
#> $Petal.Length
#> [1] 1.4 1.5
#> 
#> $Petal.Width
#> [1] 0.2
#> 
#> $Species
#> [1] setosa     versicolor virginica 
#> Levels: setosa versicolor virginica

Impute missing data using the mode. But note that we use Mode, which returns only the first mode in cases where there are multiple modes. You may need to adjust your method if you have multiple modes.

# Create missing data
d = iris
d[1, ] = rep(NA, ncol(iris))

head(d)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1           NA          NA           NA          NA    <NA>
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

# Replace missing values with the mode
d = d %>% 
  mutate(across(everything(), ~coalesce(., Mode(.))))

head(d)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 1          5.0         3.0          1.5         0.2 versicolor
#> 2          4.9         3.0          1.4         0.2     setosa
#> 3          4.7         3.2          1.3         0.2     setosa
#> 4          4.6         3.1          1.5         0.2     setosa
#> 5          5.0         3.6          1.4         0.2     setosa
#> 6          5.4         3.9          1.7         0.4     setosa

using pipes for unique() function

1 Answers1