Trigram Search In SQL Server

Paul White shows how to implement trigram wildcard searches in SQL Server:

The basic idea of a trigram search is quite simple:

  1. Persist three-character substrings (trigrams) of the target data.
  2. Split the search term(s) into trigrams.
  3. Match search trigrams against the stored trigrams (equality search)
  4. Intersect the qualified rows to find strings that match all trigrams
  5. Apply the original search filter to the much-reduced intersection

We will work through an example to see exactly how this all works, and what the trade-offs are.

A must-read.  N-grams in SQL Server is an example of a non-obvious data architecture which performs much better than the obvious alternative, at least when the conditions are right.

Related Posts

Hooking SQL Server to Kafka

Niels Berglund has an interesting scenario for us: We see how the procedure in Code Snippet 2 takes relevant gameplay details and inserts them into the dbo.tb_GamePlay table. In our scenario, we want to stream the individual gameplay events, but we cannot alter the services which generate the gameplay. We instead decide to generate the event from the database […]

Read More

Choosing Clustered Index Columns

Ed Elliott wades into the clustered index debate: I have seen this debated in forums spread over the internet for decades, and the advice that we gave ten years ago isn’t as valid today as it was then. Ten years ago, memory was considerably less, and disks were spinning rust. The advent of SSD’s and […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930