Regular Expressions Against Large Data Sets

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.

Related Posts

Batch Mode Normalization

Paul White digs into batch mode normalization and its consequences for performance: I mentioned in the introduction that not all eight-byte data types can fit in 64 bits. This fact is important because many columnstore and batch mode performance optimizations only work with data 64 bits in size. Aggregate pushdown is one of those things. There are […]

Read More

Comparing CAST and CONVERT Performance

Max Vernon runs a performance test of CAST versus CONVERT: This post is a follow-up to my prior post inspecting the performance of PARSE vs CAST & CONVERT, where we see that PARSE is an order of magnitude slower than CONVERT. In this post, we’ll check if there is a similar difference between using CAST or CONVERT. But just to be clear, CONVERT offers […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930