Is there a package that contains Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch from MiscPsycho is too slow for this.
Asked
Active
Viewed 1.8k times
34
4 Answers
21
And stringdist in the stringdist package does it too, even faster than levenshteinDist under certain conditions (1)
Ben
- 41,615
- 18
- 132
- 227
-
3stringdist has sped up significantly since that blog you link to: it now uses multiple cores. β Feb 26 '16 at 17:02
17
levenshteinDist (from the RecordLinkage package) calls compiled C code. Give it a try.
MichaelChirico
- 33,841
- 14
- 113
- 198
gd047
- 29,749
- 18
- 107
- 146
-
2Just noting the RecordLinkage package is apparently no longer maintained and has been pulled from CRAN. The `stringdist` package is the solution now. β Brian Stamper Feb 27 '20 at 17:42
-
Just noting the RecordLinkage package is *not* pulled from CRAN, itβs just available: https://cran.r-project.org/web/packages/RecordLinkage/ β MS Berends Aug 12 '22 at 19:41
6
You could try stringDist from Biostrings as well
MichaelChirico
- 33,841
- 14
- 113
- 198
Aaron Statham
- 2,048
- 1
- 15
- 16
1
You could also use levenshtein_distance() from the textTinyR package. I got 'calloc' memory errors with all other packages when it came to larger character vectors of around 30k characters. Only textTinyR worked for me!
interrobang
- 83
- 1
- 7