I'm looking to display all possible duplicate entries from my MySQL table from column name.
For example, Adobe Photoshop and Photoshop should be shown as 'possible duplicates.'
Is this possible?
I'm looking to display all possible duplicate entries from my MySQL table from column name.
For example, Adobe Photoshop and Photoshop should be shown as 'possible duplicates.'
Is this possible?
You can start with using SOUNDEX(), this will probably do for what you need
The drawbacks of SOUNDEX() are:
Example:
SELECT SOUNDEX('Microsoft')
SELECT SOUNDEX('Microsift')
SELECT SOUNDEX('Microsift Corporation')
SELECT SOUNDEX('Microsift Subsidary')
/* all of these return 'M262' */
For more advanced needs, I think you need to look at the Levenshtein distance (also called "edit distance") of two strings and work with a threshold. This is the more complex (=slower) solution, but it allows for greater flexibility.
Main drawback is, that you need both strings to calculate the distance between them. With SOUNDEX you can store a pre-calculated SOUNDEX in your table and compare/sort/group/filter on that. With the Levenshtein distance, you might find that the difference between "Microsoft" and "Nzcrosoft" is only 2, but it will take a lot more time to come to that result.
In any case, an example Levenshtein distance function for MySQL can be found at Levenshtein Distance as a MySQL Stored Function .