Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Related Posts

SQL Server’s Referential Integrity Operator

Joe Obbish explains the purpose of the referential integrity operator in SQL Server 2016: What would happen if a parent table was referenced by hundreds of child tables, such as for a date dimension table? Deleting or updating a row in the parent table would create a query plan with at least one join per […]

Read More

Everyone’s Data Is Dirty

Kevin Feasel

2017-11-16

Data

Chirag Shivalker hits the highlights on dirty data: It might sound a bit abrupt, but clean data is a myth. If your data is dirty, so is everyone else’s. Enterprises are more than dependent on data these days, and it is going to stay the same in coming years. They need to collect data in order […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728