Regular Expressions In Lucene

Kendra Little looks at Azure Search searches:

I wanted to be able to find all architect jobs using something like ‘%rchit%’ as well, because there’s not a lot of great ways to do this in SQL Server.

In SQL Server, you can use a traditional B-Tree index to seek, but only based on the letters at the beginning of a character column.  If I want to know every business title that contains ‘%rchit%’, I’m going to have to scan an entire index.

SQL Server fulltext indexes don’t solve the double-wildcard problem, either. Fulltext indexes support word prefix searches– so a fulltext index would be great at finding all job titles that contain a word that starts with ‘Arch%’.

Sometimes that’s enough. But a lot of times, you do need to find a substring anywhere in a word. And sometimes you do want to offload that from your database.

This is the kind of problem Lucene (and its follow-up implementations, like Elasticsearch) was designed to solve.  Read on for more details as Kendra solves the problem in Azure Search.

Related Posts

Connecting To Elasticsearch With R

Jerod Johnson has a sample of connecting to Elasticsearch with R: You will need the following information to connect to Elasticsearch as a JDBC data source: Driver Class: Set this to cdata.jdbc.elasticsearch.ElasticsearchDriver. Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder. The DBI functions, […]

Read More

Automating Azure SQL Database Scaling

Arun Sirpal shows how to use Azure Logic Apps to auto-scale Azure SQL Database: When I was presenting my Azure SQL Database session at DataRelay (used to be SQLRelay) I was asked (over coffee) about auto scaling capabilities. Quite simply there is nothing out of the box to achieve this. The idea of auto scaling […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31