Peter Coates provides an extremely fast estimate of Levenshtein Distance:
If your application requires a precise LD value, this heuristic isn’t for you, but the estimates are typically within about 0.05 of the true distance, which is more than enough accuracy for such tasks as:
-
Confirming suspected near-duplication.
-
Estimating how much two document vary.
-
Filtering through large numbers of documents to look for a near-match to some substantial block of text.
The estimation process is pretty interesting. Worth a read.
Comments closed