Trigram Search In SQL Server

Paul White shows how to implement trigram wildcard searches in SQL Server:

The basic idea of a trigram search is quite simple:

  1. Persist three-character substrings (trigrams) of the target data.
  2. Split the search term(s) into trigrams.
  3. Match search trigrams against the stored trigrams (equality search)
  4. Intersect the qualified rows to find strings that match all trigrams
  5. Apply the original search filter to the much-reduced intersection

We will work through an example to see exactly how this all works, and what the trade-offs are.

A must-read.  N-grams in SQL Server is an example of a non-obvious data architecture which performs much better than the obvious alternative, at least when the conditions are right.

Related Posts

A Frugal Stretch Database Alternative

Chris Bell shares a version of Stretch databases for people with budgets: Stretch databases were going to provide “Cost-effective” availability for cold data, and unlike typical cold data storage,  our data would always be online and available to query. Applications would not need to be modified to work with the seamless design of the stretch […]

Read More

Use Cases For Apache Kafka

Amy Boyle shows a few scenarios where New Relic uses Apache Kafka: The Events Pipeline team is responsible for plumbing some of New Relic’s core data streams-specifically, event data. These are fine-grained nuggets of monitoring data that record a single event at a particular moment in time. For example, an event could be an error thrown […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930