Paul White shows how to implement trigram wildcard searches in SQL Server:
The basic idea of a trigram search is quite simple:
- Persist three-character substrings (trigrams) of the target data.
- Split the search term(s) into trigrams.
- Match search trigrams against the stored trigrams (equality search)
- Intersect the qualified rows to find strings that match all trigrams
- Apply the original search filter to the much-reduced intersection
We will work through an example to see exactly how this all works, and what the trade-offs are.
A must-read. N-grams in SQL Server is an example of a non-obvious data architecture which performs much better than the obvious alternative, at least when the conditions are right.
Comments closed