I have a list in a data frame of thousands of names in a long list. Many of the names have small differences in them which make them slightly different. I would like to find a way to match these names. For example:
names <- c('jon smith','jon, smith','Jon Smith','jon smith et al','bob seger','bob, seger','bobby seger','bob seger jr.')
I've looked at amatch in the stringdist function, as well as agrep, but these all require a master list of names that are used to match another list of names against.  In my case, I don't have such a master list so I'd like to create one from the data by identifying names with highly similar patterns so I can look at them and decide whether they're the same person (which in many cases they are). I'd like an output in a new column that helps me to know these are a likely match, and maybe some sort of similarity score based on Levenshtein distance or something. Maybe something like this:
            names   match      SimilarityScore
1       jon smith     a               9
2      jon, smith     a               8
3       Jon Smith     a               9
4 jon smith et al     a               5
5       bob seger     b               9
6      bob, seger     b               8
7     bobby seger     b               7
8   bob seger jr.     b               5
Is something like this possible?
 
     
    


