Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Related Posts

A Forensic Accounting Case Study

I have a new series I’ve started on applying forensic accounting techniques as a data platform specialist: Before I dig into my case study, I want to make it absolutely clear that these techniques will help you do a lot more than uncover fraud in your environment. My hope is that there is no fraud […]

Read More

Data Modeling: Not Everything Is VARCHAR(8000)

Bert Wagner gives us several reasons why VARCHAR(8000) can be a bad fit for data types: When first learning SQL Server, I had the brilliant idea of defining every column on my table as VARCHAR(8000). After all, the VARCHAR datatype only consumes as much storage space as the string itself (plus two bytes), so even if […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728