Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Related Posts

Generating Realistic-Looking Data With Markov Chains

Phil Factor shows how to use Markov chain generation in T-SQL to generate realistic-looking country names: How did we do this? We started with a table that took each word, added two spaces at the beginning and a |, followed by two subsequent spaces, at the end. This allowed us to map the frequency of […]

Read More

Closure Tables: Graph Data In Relational Form

Phil Factor shows how to use the concept of closure tables to represent graph-style data in a relational database: Closure tables are plain ordinary relational tables that are designed to work easily with relational operations. It is true that useful extensions are provided for SQL Server to deal with hierarchies. The HIERARCHYID data type and the common […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728