Regular Expressions Against Large Data Sets

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.

Related Posts

Replaying Workloads with WorkloadTools

Gianluca Sartori shows an example of using the WorkloadTools application to replay a workload, including where the analytics server cannot directly access the production database: Regardless of the method that you decided to use, at the end of the replays, you will have two distinct sets of tables containing the workload analysis data, sitting in […]

Read More

Troubleshooting Query Performance Changes

Erin Stellato walks us through a troubleshooting guide when users complain about poorly-performing queries: This is tale of troubleshooting… When you unexpectedly or intermittently encounter a change in query performance, it can be extremely frustrating and troublesome for a DBA or developer. If you’re not using Query Store, a third-party application, or your own method […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930