Let's assume I have the following lookup table:
(lkp <- structure(list(a = c("a", "a", "a", "b", "c"),
b = c("a1 a2", "a3 a2", "a3", "a1", "a1")),
row.names = c("lkp_1", "lkp_2", "lkp_3", "lkp_4", "lkp_5"),
class = "data.frame"))
# a b
# lkp_1 a a1 a2
# lkp_2 a a3 a2
# lkp_3 a a3
# lkp_4 b a1
# lkp_5 c a1
I want to check if another data.frame, x, say, is a subset of lkp, with the important additional requirement, that for column b matching means that lkp$b need only to contain x$b.
The following example should make clear what I mean:
(chk <- list(c1 = structure(list(a = c("a", "a"), b = c("a2", "a2")), row.names = c(NA, -2L), class = "data.frame"),
c2 = structure(list(a = "b", b = "a1"), row.names = c(NA, -1L), class = "data.frame"),
c3 = structure(list(a = c("a", "a"), b = c("a1", "a1")), row.names = c(NA, -2L), class = "data.frame"),
c4 = structure(list(a = c("a", "a"), b = c("a3", "a2")), row.names = c(NA, -2L), class = "data.frame")))
# $c1
# a b
# 1 a a2
# 2 a a2
# $c2
# a b
# 1 b a1
# $c3
# a b
# 1 a a1
# 2 a a1
# $c4
# a b
# 1 a a3
# 2 a a2
chk$c1: row 1 matches rowlkp_1(andlkp_2) as columnais the same andlkp$bcontainsa2chk$c2andchk$c4match as wellchk$c3does NOT match. While each row matcheslkp_1,c4is not a subset aslkpwould need to contain 2 different rows which match.
In principle I am looking for a merge (or join) where the join condition would use some sort of fuzzy matching.
I have found and read these two SO answers:
- How to check if a row is a subset of a data.frame?
- R merge data frames, allow inexact ID matching (e.g. with additional characters 1234 matches ab1234 )
And especially the second answer looks promising. However, I do not need approximate matching but rather some sort of does_contain relationship instead of pure equality. So maybe a regex solution would work?
Expected Outcome
magic_is_subset_function <- function(chk, lkp) {
# ...
}
sapply(chk, magic_is_subset_function, lkp = lkp)
# [1] TRUE TRUE FALSE TRUE