To streamline data wrangling, I write a wrapper function consisted of several "verb functions" that process the data. Each one performs one task on the data. However, not all tasks are applicable to all datasets that pass through this process, and sometimes, for certain data, I might want to switch off some "verb functions", and skip them.
I'm trying to understand whether there's a conventional/canonical way to build such workflow within a wrapper function in R. Importantly, a way that will be efficient, both performance-wise and concise code.
Example
As part of data wrangling, I want to carry out several steps:
- Clean up column headers (using 
janitor::clean_names()) - Recode values in the data, such that 
TRUEandFALSEare replaced with1and0(usinggsub()). - Recode string values to lowercase (using 
tolower()). - Pivot wider based on specific 
idcolumn (usingtidyr::pivot_wider) - Drop rows with 
NAvalues (usingdplyr::drop_na()) 
Toy data
library(stringi)
library(tidyr)
set.seed(2021)
# simulate data
df <-
  data.frame(id = 1:20,
           isMale = rep(c("true", "false"), times = 10),
           WEIGHT = sample(50:100, 20),
           hash_Numb = stri_rand_strings(20, 5)) %>%
  cbind(., score = sample(200:800, size = 20))
# sprinkle NAs randomly
df[c("isMale", "WEIGHT", "hash_Numb", "score")] <-
  lapply(df[c("isMale", "WEIGHT", "hash_Numb", "score")], function(x) {
    x[sample(seq_along(x), 0.25 * length(x))] <- NA
    x
  })
df <- 
  df %>%
  tidyr::expand_grid(., Condition = c("A","B"))
df
#> # A tibble: 40 x 6
#>       id isMale WEIGHT hash_Numb score Condition
#>    <int> <chr>   <int> <chr>     <int> <chr>    
#>  1     1 <NA>       56 EvRAq        NA A        
#>  2     1 <NA>       56 EvRAq        NA B        
#>  3     2 false      87 <NA>        322 A        
#>  4     2 false      87 <NA>        322 B        
#>  5     3 true       95 13pXe       492 A        
#>  6     3 true       95 13pXe       492 B        
#>  7     4 <NA>       88 4WMBS       626 A        
#>  8     4 <NA>       88 4WMBS       626 B        
#>  9     5 true       NA Nrl1W       396 A        
#> 10     5 true       NA Nrl1W       396 B        
#> # ... with 30 more rows
Created on 2021-03-03 by the reprex package (v0.3.0)
The data shows test scores of 20 people who took a test under two conditions. For each person we also know the gender (isMale), the weight in kilograms(WEIGHT), and a unique hash_number.
Data cleanup and wrangling
Before this data is sent to analysis, it needs to be cleaned up, according to a certain chain of steps, which I laid out above.
library(janitor)
library(dplyr)
# helper function
convert_true_false_to_1_0 <- function(x) {
  
  first_pass <- gsub("^(?:TRUE)$", 1, x, ignore.case = TRUE)
  gsub("^(?:FALSE)$", 0, first_pass, ignore.case = TRUE)
}
# chain of steps
df %>%
  janitor::clean_names() %>%
  mutate(across(everything(), convert_true_false_to_1_0)) %>%
  mutate(across(everything(), tolower)) %>%
  pivot_wider(names_from = condition, values_from = score) %>%
  drop_na()
My Question: How to pack this process in a wrapper that allows to flexibly switch some steps off?
One idea I have in my mind is to use a %>% pipe with conditionals such as:
my_wrangling_wrapper <- function(dat,
                                 clean_names       = TRUE, 
                                 convert_tf_to_1_0 = TRUE, 
                                 convert_to_lower  = TRUE, 
                                 pivot_widr        = TRUE,
                                 drp_na            = TRUE){
  dat %>%
    {if (clean_names)       janitor::clean_names(.)                                     else .} %>%
    {if (convert_tf_to_1_0) mutate(., across(everything(), convert_true_false_to_1_0))  else .} %>%
    {if (convert_to_lower)  mutate(., across(everything(), tolower))                    else .} %>%
    {if (pivot_widr)        pivot_wider(., names_from = condition, values_from = score) else .} %>%
    {if (drp_na)            drop_na(.)                                                  else .}
}
This way, all steps are defaulted to happen, unless turned off:
- Use-case #1 -- Default run:
 
> my_wrangling_wrapper(dat = df)
## # A tibble: 6 x 6
##   id    is_male weight hash_numb a     b    
##   <chr> <chr>   <chr>  <chr>     <chr> <chr>
## 1 3     1       95     13pxe     492   492  
## 2 9     1       54     hgzxp     519   519  
## 3 12    0       72     vwetc     446   446  
## 4 15    1       52     qadxc     501   501  
## 5 17    1       71     g42vg     756   756  
## 6 18    0       80     qiejd     712   712 
- Use-case #2 -- Don't convert 
true/falseto1/0and don't dropNAs: 
> my_wrangling_wrapper(dat = df, convert_tf_to_1_0 = FALSE, drp_na = FALSE)
## # A tibble: 20 x 6
##    id    is_male weight hash_numb a     b    
##    <chr> <chr>   <chr>  <chr>     <chr> <chr>
##  1 1     NA      56     evraq     NA    NA   
##  2 2     false   87     NA        322   322  
##  3 3     true    95     13pxe     492   492  
##  4 4     NA      88     4wmbs     626   626  
##  5 5     true    NA     nrl1w     396   396  
##  6 6     false   NA     4oq74     386   386  
##  7 7     true    NA     gg23f     NA    NA   
##  8 8     false   94     NA        NA    NA   
##  9 9     true    54     hgzxp     519   519  
## 10 10    false   97     NA        371   371  
## 11 11    true    90     NA        768   768  
## 12 12    false   72     vwetc     446   446  
## 13 13    NA      NA     jkhjh     338   338  
## 14 14    false   NA     0swem     778   778  
## 15 15    true    52     qadxc     501   501  
## 16 16    false   75     NA        219   219  
## 17 17    true    71     g42vg     756   756  
## 18 18    false   80     qiejd     712   712  
## 19 19    NA      68     tadad     NA    NA   
## 20 20    NA      53     iyw3o     NA    NA  
My problem
Although the solution I came up with does work, I've learned that relying on the pipe operator is not advised within functions, because it slows down the process (see reference). Also, since %>% is not part of base R, there has to be a way to achieve the same "tweakable wrapping" functionality without the pipe. So I wonder: is there a conventional way to write a wrapper function that could be tweaked to turn off some of its components, and still overall remain performance-efficient?
{It's worth mentioning that I've asked a similar question regarding building a wrapper for ggplot, turning geoms off as desired. The answer was great but not applicable to the current question.}