Nikhil Babar has an introduction to the Levenshtein distance algorithm:

The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions, or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who discovered this equation in 1965.

Levenshtein distance may also be referred to as

edit distance, although it may also denote a larger family of distance metrics. It is closely related to pairwise string alignments.

Read on for an explanation and example. Levenshtein is a great way of calculating string similarities, possibly helping you with tasks like data cleansing by finding typos or alternate spellings, or matching down parts of street addresses.

Kevin Feasel

2018-10-03

Data Science