Levenshtein Distances

Peter Coates provides an extremely fast estimate of Levenshtein Distance:

If your application requires a precise LD value, this heuristic isn’t for you, but the estimates are typically within about 0.05 of the true distance, which is more than enough accuracy for such tasks as:

  • Confirming suspected near-duplication.

  • Estimating how much two document vary.

  • Filtering through large numbers of documents to look for a near-match to some substantial block of text.

The estimation process is pretty interesting.  Worth a read.

Related Posts

Matrix Transposition In T-SQL

Phil Factor has some fun transposing a matrix using T-SQL: What I’m doing is simply converting the table into its JSON form, and then using this to create a table using the multi-row VALUES  syntax which paradoxically allows expressions. The expression I’m using is JSON_Value, which allows me do effectively dictate the source within the table, via […]

Read More

Taking A Random Walk

Dan Goldstein describes the basics of Brownian motion: I was sitting in a bagel shop on Saturday with my 9 year old daughter. We had brought along hexagonal graph paper and a six sided die. We decided that we would choose a hexagon in the middle of the page and then roll the die to […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930