I'm sure there is a better way to ask this question, and perhaps if I knew it I'd have found an existing answer. The gist is this:
- I'm reading in SQL data with R with NEW, UPDATE, and DELETE rows
- table uniqueness is determined by values in two columns (call them
aandb) - I want (a) find all combinations of
aandbin which columnchas a DELETE value, and (b) then remove all rows with one of theseaandbcombinations
Here's a reproducible example:
library(dplyr)
df <- data.frame(
a = c(1, 1, 1, 2, 2),
b = c(3, 3, 4, 5, 5),
c = c("new", "delete", "new", "new", "update")
)
What I tried, not thinking this through:
df %>%
mutate(filter_key = paste0(a, "_", b)) %>%
filter(!(filter_key %in% (df %>%
filter(c == "delete") %>%
pull(filter_key) %>%
unique())))
The problem is that filter_key is not in df, so it's not there to pull unique values for. I'm curious if there is a way to refer to it "on the fly"? Some answers led me to think I could perhaps use . or .data, but this didn't work for me (replacing df with . or .data).
The trivial solution is to add it beforehand:
df <- df %>% mutate(filter_key = paste0(a, "_", b))
df %>%
filter(!(filter_key %in% (df %>%
filter(c == "delete") %>%
pull(filter_key) %>%
unique())))
Alternatively (though this changes the question, so I'll leave it in these "notes"), there's probably a way to just filter directly, but I wasn't sure how to say:
- give me all the combos of
aandbwhere any row hasc == "delete" - remove all rows where
aandbmatch one of these combos