Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Related Posts

Choosing A Hadoop Data Format

Silvia Oliveros has a set of considerations to help you choose a file format for your data in Hadoop: What does your pipeline look like, and what steps are involved? Some of the file formats were optimized to work in certain situations. For example, Sequence files were designed to easily share data between Map Reduce […]

Read More

Updating Large Tables In SQL Server And Oracle

Jana Sattainathan has a post on how he was able to move and update billions of rows, using both Oracle and SQL Server as examples: The key thing to remember with SQL Server is to convert to a non-integer value by using a “decimal” as shown in the above example with “10.”. This is the same […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728