Suppose I have the following data:
example = tibble::tibble(
  id = 1:10,
  vac = c("FFizer", "sinovasm", "aztraseneca", "phiser", "sonovac",
          "faizer", "sinivasc", "astraseneca", "sinocav", "aztraxeneca")
)
Which looks like this:
# A tibble: 10 x 2
      id vac        
   <int> <chr>      
 1     1 FFizer     
 2     2 sinovasm   
 3     3 aztraseneca
 4     4 phiser     
 5     5 sonovac    
 6     6 faizer     
 7     7 sinivasc   
 8     8 astraseneca
 9     9 sinocav    
10    10 aztraxeneca
And I want to find if the variable lab matchs in some degree with any option from a vector.
Say the vector to use as identifier is:
labs = c("sinovac", "pfizer", "astrazeneca")
Crossing example data.frame with the vector labs should give some output like this:
correction = tibble::tibble(
  id = 1:10,
  vac = c("FFizer", "sinovasm", "aztraseneca", "phiser", "sonovac",
          "faizer", "sinivasc", "astraseneca", "sinocav", "aztraxeneca"),
  match = c("pfizer", "sinovac", "astrazeneca", "pfizer", "sinovac",
            "pfizer", "sinovac", "astrazeneca", "sinovac", "astrazeneca")
)
Looking like this:
# A tibble: 10 x 3
      id vac         match      
   <int> <chr>       <chr>      
 1     1 FFizer      pfizer     
 2     2 sinovasm    sinovac    
 3     3 aztraseneca astrazeneca
 4     4 phiser      pfizer     
 5     5 sonovac     sinovac    
 6     6 faizer      pfizer     
 7     7 sinivasc    sinovac    
 8     8 astraseneca astrazeneca
 9     9 sinocav     sinovac    
10    10 aztraxeneca astrazeneca
The main idea is to find a way of having a homogeneous vac variable
In addition to this, I'd like to create a variable which indicated the "matching degree". I mean, if the string is "FFizer", then its match would be "pfizer" and their matching degree would be around 0.66