I have two files, one is full of keywords (roughly 2,000 rows) and the other is full of text (roughly 770,000 rows). The keyword file looks like:
Event Name            Keyword
All-day tabby fest    tabby, all-day
All-day tabby fest    tabby, fest
Maine Coon Grooming   maine coon, groom    
Maine Coon Grooming   coon, groom
keywordFile <- tibble(EventName = c("All-day tabby fest", "All-day tabby fest", "Maine Coon Grooming","Maine Coon Grooming"), Keyword = c("tabby, all-day", "tabby, fest", "maine coon, groom", "coon, groom")
The text file looks like:
Description
Bring your tabby to the fest on Tuesday
All cats are welcome to the fest on Tuesday
Mainecoon grooming will happen at noon Wednesday
Maine coons will be pampered at noon on Wednesday
text <- tibble(Description = c("Bring your tabby to the fest on Tuesday","All cats are welcome to the fest on Tuesday","Mainecoon grooming will happen at noon Wednesday","Maine coons will be pampered at noon on Wednesday")
What I want is to iterate through the text file and look for fuzzy matches (must include each word in the "Keyword" column) and return a new column that displays TRUE or False. If that is TRUE, then I want a third column to display the event name. So something that looks like:
Description                                          Match?   Event Name
Bring your tabby to the fest on Tuesday              TRUE     All-day tabby fest
All cats are welcome to the fest on Tuesday          FALSE
Mainecoon grooming will happen at noon Wednesday     TRUE     Maine Coon Grooming
Maine coons will be pampered at noon on Wednesday    FALSE
I am able to successfully do my fuzzy matches (after converting everything to lowercase) with stuff like this, thanks to Molx (How can I check if multiple strings exist in another string?):
str <- c("tabby", "all-day")
myStr <- "Bring your tabby to the fest on Tuesday"
all(sapply(str, grepl, myStr))
However, I am getting stuck when I try to fuzzy match the whole files. I tried something like this:
for (i in seq_along(text$Description)){
  for (j in seq_along(keywordFile$EventName)) {
    # below I am creating the TRUE/FALSE column
    text$TF[i] <- all(sapply(keywordFile$Keyword[j], grepl, 
                                                     text$Description[i]))
    if (isTRUE(text$TF))
      # below I am creating the EventName column
      text$EventName <- keywordFile$EventName
    }
}
I don't think I'm having trouble converting the right things to vectors and strings. My keywordFile$Keyword column is a bunch of string vectors and my text$Description column is a character string. But I'm struggling with how to iterate properly through both files. The error I'm getting is
Error in ... replacement has 13 rows, data has 1
Has anyone done anything like this before?
 
     
    