I have 5 vectors with column names, which are similar, but not identical.
I am trying to find a way to correct the entries in vector2, vector3, vector4, vector5, based on the names in vector1.
I have been getting some ideas here and here, leading to the code below. But in the end, I even get stuck comparing the first two.vectors. Let alone overwriting them.
library(dplyr)
library(fuzzyjoin)
vector1 <- c("something","nothing", "anything", "number4")
vector2 <- c("some thing","no thing","addition", "anything", "number4")
vector3 <- c("some thing wrong","nothing", "anything_")
vector4 <- c("something","nothingg", "anything", "number_4")
vector5 <- c("something","nothing", "anything happening", "number4")
I started out as follows:
apply(adist(x = vector1, y = vector2), 1, which.min)
data.frame(string_to_match = vector1, 
           closest_match = vector2[apply(adist(x = vector1, y = vector2), 1, which.min)])
           
  string_to_match closest_match
1       something    some thing
2         nothing      no thing
3        anything      anything
4         number4       number4
Is there anyway to add the distance to this solution and to overwrite the vector based on the distance?
Desired result:
  string_to_match closest_match  distance
1       something    some thing   1
2         nothing      no thing   1
3        anything      anything   0
4         number4       number4   0
vector1 <- c("something","nothing", "anything", "number4")
vector2 <- c("something","nothing","addition", "anything", "number4")
vector3 <- c("something","nothing", "anything")
vector4 <- c("something","nothing", "anything", "number4")
vector5 <- c("something","nothing", "anything", "number4")
Is there anyone who can put me on the right track?