Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Related Posts

Master Data Services No Longer Uses Silverlight

Niko Neugebauer is happy about an update to Master Data Services in SQL Server 2019: Before we continue, let me ask you one question, have you heard about Silverlight? Or in other words, and with a kind of evil voice “DID YOU EVER INSTALLED SILVERLIGHT ON A PRODUCTION SERVER”?. If you have worked with MDS oh […]

Read More

Chicago Parking Ticket Data Set

Kevin Feasel

2018-09-26

Data

Bob Pusateri shows us a new data set to mess with: A few weeks ago I came across this blog post by Matt Chapman. Matt filed FOIA requests with the City of Chicago and, after multiple attempts, was able to get access to over 36 million parking tickets written between 2003 and 2016. Matt goes on to explain Chicago’s […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728