Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Related Posts

The Importance Of Action

Jesse Seymour has relaunched his blog and started with a controversial statement: There is no value in data. If you’re still here, then I am assuming you either a) believe I have a valid point, or b) just want to see how crazy I am for opening my new data blog with a post spouting […]

Read More

Fuzzy Searches In SQL Server

Phil Factor wants fuzzy searches done inside the relational database: Many times in the past, I’ve had arguments with members of the development teams who, when we are discussing fuzzy searches, draw themselves up to their full height, look dignified, and say that a relational database is no place to be doing fuzzy searches or […]

Read More

Categories

February 2017
MTWTFSS
« Jan  
 12345
6789101112
13141516171819
20212223242526
2728