Regular Expressions Against Large Data Sets

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.

Related Posts

Powershell Speed Testing

Shane O’Neill shows off a Powershell script which allows you to simplify performance testing: Apart from catching up on news during my commute I only really use notifications for a certain number of hashtags i.e. #SqlServer, #tsql2sday, #sqlhelp, and #PowerShell. So during work, every so often a little notification will pop up on the bottom […]

Read More

Legacy Cardinality Estimation In SQL Server

Kellyn Pot’vin-Gorman explains what the Legacy Cardinality Estimation setting does in SQL Server: Oracle DBAs have used the CARDINALITY hint for some time and it should be understood that this may appear to be similar, but is actually quite different.  As hinting in TSQL is a bit different than PL/SQL, we can compare similar queries […]

Read More


September 2016
« Aug Oct »