Regular Expressions Against Large Data Sets

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.

Related Posts

Aggregate Pushdown with GROUP BY

Paul White takes us through several performance improvements around aggregate pushdown: SQL Server 2016 introduced serial batch mode processing and aggregate pushdown. When pushdown is successful, aggregation is performed within the Columnstore Scan operator itself, possibly operating directly on compressed data, and taking advantage of SIMD CPU instructions. The performance improvements possible with aggregate pushdown can be very […]

Read More

Power Query Container Size and Performance

Chris Web looks into what changing the Power BI Dataflow container size does for us: Currently there is no way to change this 256MB in Power BI Desktop or Excel although someone has already posted a suggestion on the Ideas site to allow us to change it. How much of an impact does this actually […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930