Levenshtein Distances – Curated SQL

Peter Coates provides an extremely fast estimate of Levenshtein Distance:

If your application requires a precise LD value, this heuristic isn’t for you, but the estimates are typically within about 0.05 of the true distance, which is more than enough accuracy for such tasks as:

Confirming suspected near-duplication.
Estimating how much two document vary.
Filtering through large numbers of documents to look for a near-match to some substantial block of text.

The estimation process is pretty interesting. Worth a read.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30