Distance Between Strings: Levenshtein Distance

Nikhil Babar has an introduction to the Levenshtein distance algorithm:

The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions, or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who discovered this equation in 1965.

Levenshtein distance may also be referred to as edit distance, although it may also denote a larger family of distance metrics. It is closely related to pairwise string alignments.

Read on for an explanation and example.  Levenshtein is a great way of calculating string similarities, possibly helping you with tasks like data cleansing by finding typos or alternate spellings, or matching down parts of street addresses.

Related Posts

Data Science And Data Engineering In HDP 3.0

Saumitra Buragohain, et al, show off some of the things added to the Hortonworks Data Platform for data scientists and data engineers: We leverage the power of HDP 3.0 from efficient storage (erasure coding), GPU pooling to containerized TensorFlow and Zeppelin to enable this use case. We will the save the details for a different […]

Read More

Multi-Threaded R With Microsoft R Client

David Parr shows us how to get started with Microsoft R Client and performs some quick benchmarking: This message will pop up, and it’s worth noting as it’s got some information in it that you might need to think about: It’s worth noting that right now Microsoft r Client is lagging behind the current R version, and […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

October 2018
MTWTFSS
« Sep  
1234567
891011121314
15161718192021
22232425262728
293031