I'd like to merge rows of two data frames - df1 and df2 using column A:
#df1
A <- c('ab','ab','bc','bc','bc','cd')
B <- floor(runif(6, min=0, max=10))
C <- floor(runif(6, min=0, max=10))
D <- floor(runif(6, min=0, max=10))
E <- c('a, b, c','a, d, e','a, g, h','d, e, f','a, d, f','f, j')
df1 <- data.frame(A,B,C,D,E)
df1
A B C D E
1 ab 5 4 3 a, b, c
2 ab 9 4 0 a, d, e
3 bc 4 4 9 a, g, h
4 bc 5 5 6 d, e, f
5 bc 1 6 6 a, d, f
6 cd 1 2 0 f, j
#df2
A <- c('ab','bc','cd')
B <- floor(runif(3, min=0, max=10))
E <- c('a, d','d, f','n, m')
df2 <- data.frame(A,B,E)
df2
A B E
1 ab 4 a, d
2 bc 7 d, f
3 cd 1 n, m
I can do simply:
df3 <- merge(x=df1, y=df2, by='A', all.x = TRUE)
However there's condition of merging. Namely, I'd like to merge only rows from df2 to df1 when all substrings (column E) from df2 are present in df1, so the output should look like this:
df3
A B C D E A.y B.y E.y
1 ab 5 4 3 a, b, c NA NA NA
2 ab 9 4 0 a, d, e, ab 6 a, d
3 bc 4 4 9 a, g, h NA NA NA
4 bc 5 5 6 d, e, f bc 7 d, f
5 bc 1 6 6 a, d, f bc 7 d, f
6 cd 1 2 0 f, j NA NA NA
I know there's an option using %in% regarding vector comparison. However I have strings, should I first do some strsplit and unlist and then perform the comparison?