Regular Expressions In Lucene

Kendra Little looks at Azure Search searches:

I wanted to be able to find all architect jobs using something like ‘%rchit%’ as well, because there’s not a lot of great ways to do this in SQL Server.

In SQL Server, you can use a traditional B-Tree index to seek, but only based on the letters at the beginning of a character column.  If I want to know every business title that contains ‘%rchit%’, I’m going to have to scan an entire index.

SQL Server fulltext indexes don’t solve the double-wildcard problem, either. Fulltext indexes support word prefix searches– so a fulltext index would be great at finding all job titles that contain a word that starts with ‘Arch%’.

Sometimes that’s enough. But a lot of times, you do need to find a substring anywhere in a word. And sometimes you do want to offload that from your database.

This is the kind of problem Lucene (and its follow-up implementations, like Elasticsearch) was designed to solve.  Read on for more details as Kendra solves the problem in Azure Search.

Related Posts

Working With Azure SQL Managed Instances

Jovan Popovic has a couple of posts covering configuration for Azure SQL Managed Instances.  First, he looks at how to configure tempdb: One limitation in the current public preview is that tempdb don’t preserves custom settings after fail-over happens. If you add new files to tempdb or change file size, these settings will not be preserved after fail-over, and […]

Read More

Flushing The Authentication Cache

Arun Sirpal describes an Azure SQL DB-only DBCC command: This command only applies to Azure SQL Database, at a high level it empties the database authentication cache for logins and firewall rules for the current USER database. In Azure SQL Database the authentication cache makes a copy of logins and server firewall rules which are […]

Read More


October 2016
« Sep Nov »