Distance Between Strings: Levenshtein Distance

Nikhil Babar has an introduction to the Levenshtein distance algorithm:

The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions, or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who discovered this equation in 1965.

Levenshtein distance may also be referred to as edit distance, although it may also denote a larger family of distance metrics. It is closely related to pairwise string alignments.

Read on for an explanation and example.  Levenshtein is a great way of calculating string similarities, possibly helping you with tasks like data cleansing by finding typos or alternate spellings, or matching down parts of street addresses.

Related Posts

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More

Z-Tests vs T-Tests

Stephanie Glen has a picture which explains the difference between a Z-test and a T-test: The following picture shows the differences between the Z Test and T Test. Not sure which one to use? Find out more here: T-Score vs. Z-Score. Click through for the picture.

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031