Implementing SoundEx

Kevin Feasel

2016-06-15

Search

Dror Helper shows how to implement SoundEx in C#:

It’s fairly easy to follow the steps of the algorithm (as defined by Wikipedia):

  1. Retain the first letter of the name and drop all other occurrences of a, e, I, o, u, y, h, w.

  2. Replace consonants with digits as follows (after the first letter):

    • b, f, p, v → 1

    • c, g, j, k, q, s, x, z → 2

    • d, t → 3

    • l → 4

    • m, n → 5

    • r → 6

  3. If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by ‘h’ or ‘w’ are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.

  4. If you have too few letters in your word that you can’t assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers.

SQL Server also supports SOUNDEX as a built-in function.

Related Posts

The Decline(?) Of Google Search

Kevin Feasel

2017-09-13

Search

Vincent Granville argues that Google search is on a slow decline: What has happened over the last few years is that many websites are now getting most of their traffic from sources other than Google. Google is no longer the main source of traffic for most websites, because webmasters pursue other avenues to generate relevant […]

Read More

Trigram Search In SQL Server

Paul White shows how to implement trigram wildcard searches in SQL Server: The basic idea of a trigram search is quite simple: Persist three-character substrings (trigrams) of the target data. Split the search term(s) into trigrams. Match search trigrams against the stored trigrams (equality search) Intersect the qualified rows to find strings that match all […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930