Is there a way to dynamically/programmatically generate case_when conditions in dplyr with different column names and/or different numbers of conditions? I have an interactive script that I'm trying to convert into a function. There's a lot of repeated code in the case_when statements and I'm wondering if it can be automated somehow without my needing to write everything from scratch again and again.
Here's a dummy dataset:
test_df = tibble(low_A=c(5, 15, NA),
                 low_TOT=c(NA, 10, NA),
                 low_B=c(20, 25, 30),
                 high_A=c(NA, NA, 10),
                 high_TOT=c(NA, 40, NA),
                 high_B=c(60, 20, NA))
expected_df = tibble(low_A=c(5, 15, NA),
                     low_TOT=c(NA, 10, NA),
                     low_B=c(20, 25, 30),
                     ans_low=c(5, 10, 30),
                     high_A=c(NA, NA, 10),
                     high_TOT=c(NA, 40, NA),
                     high_B=c(60, 20, NA),
                     ans_high=c(60, 40, 10))
> expected_df
# A tibble: 3 x 8
  low_A low_TOT low_B ans_low high_A high_TOT high_B ans_high
  <dbl>   <dbl> <dbl>   <dbl>  <dbl>    <dbl>  <dbl>    <dbl>
1     5      NA    20       5     NA       NA     60       60
2    15      10    25      10     NA       40     20       40
3    NA      NA    30      30     10       NA     NA       10
The logic I want is that if the ._TOT column has a value, use that. If not, then try column ._A, and if not, then column ._B. Note that I intentionally didn't put ._TOT as the first column for a group. I could just use coalesce() in that case, but I want a general solution irrespective of column order.
Of course, all of this is easy to do with a couple of case_when statements. My issues are that:
- I'm trying to make a general function and so don't want interactive/tidy evaluation.
- I have a whole bunch of columns like this. All ending with one of _TOT, _A, _Bbut with different prefixes (e.g.,low_TOT, low_A, low_B, high_TOT, high_A, high_B,.....and I don't want to rewrite a bunch ofcase_whenfunctions again and again.
What I have right now looks like this (where I'm writing a case_when for each prefix):
def my_function = function(df) { 
    df %>% mutate(
          # If a total low doesn't exist, use A (if exists) or B (if exists)
          "ans_low" := case_when(
            !is.na(.data[["low_TOT"]]) ~ .data[["low_TOT"]],
            !is.na(.data[["low_A"]]) ~ .data[["low_A"]],
            !is.na(.data[["low_B"]]) ~ .data[["low_B"]],
          ),
          # If a total high doesn't exist, use A (if exists) or B (if exists)
          "ans_high" := case_when(
            !is.na(.data[["high_TOT"]]) ~ .data[["high_TOT"]],
            !is.na(.data[["high_A"]]) ~ .data[["high_R"]],
            !is.na(.data[["high_B"]]) ~ .data[["high_B"]],
              
         # Plus a whole bunch of similar case_when functions...
}
And what I'd like is to ideally get a way to dynamically generate case_when functions with different conditions so that I'm not writing a new case_when each time by exploiting the fact that:
- All the three conditions have the same general form, and the same structure for the variable names, but with a different prefix (high_,low_, etc.).
- They have the same formula of the form !is.na( .data[[ . ]]) ~ .data[[ . ]], where the dot(.) is the dynamically generated name of the column.
What I'd like is something like:
def my_function = function(df) { 
    df %>% mutate(
          "ans_low" := some_func(prefix="Low"),
          "ans_high" := some_func(prefix="High")
}
I tried creating my own case_when generator to replace the standard case_when as shown below, but I'm getting an error. I'm guessing that's because .data doesn't really work outside of the tidyverse functions?
some_func = function(prefix) {
  case_when(
    !is.na(.data[[ sprintf("%s_TOT", prefix) ]]) ~ .data[[ sprintf("%s_TOT", prefix) ]],
    !is.na(.data[[ sprintf("%s_A", prefix) ]]) ~ .data[[ sprintf("%s_A", prefix) ]],
    !is.na(.data[[ sprintf("%s_B", prefix) ]]) ~ .data[[ sprintf("%s_B", prefix) ]]
  )
}
Something else I'm curious about is making an even more general case_when generator. In the examples thus far, it's only the names (prefix) of the columns that are changing. What if I wanted to
- change the number and names of suffixes (e.g., high_W, high_X, high_Y, high_Z, low_W, low_X, low_Y, low_Z, .......) and so make a character vector of suffixes an argument ofsome_func
- change the form of the formula. Right now, it's of the form !is.na(.data[[ . ]]) ~ .data[[ . ]]for all the conditions, but what if I wanted to make this an argument ofsome_func? For example,!is.na(.data[[ . ]]) ~ sprintf("%s is missing", .)
I'd be happy with just getting it to work with different prefixes but it'd be very cool to understand how I could achieve something even more general with arbitrary (but common) suffixes and arbitrary formulae such that I can do some_func(prefix, suffixes, formula).
 
     
     
     
     
    