Regular Expressions Against Large Data Sets

Liz Bennett explains types of regular expressions which do not scale:

With recursive backtracking based regex engines, it is possible to craft regular expressions that match in exponential time with respect to the length of the input, whereas the Thompson NFA algorithm will always match in linear time. As the name would imply, the slower performance of the recursive backtracking algorithm is caused by the backtracking involved in processing input. This backtracking has serious consequences when working with regexes at a high scale because an inefficient regex can take orders of magnitude longer to match than an efficient regex. The standard regex engines in most modern languages, such as Java, Python, Perl, PHP, and JavaScript, use this recursive backtracking algorithm, so almost any modern solution involving regexes will be vulnerable to poorly performing regexes. Fortunately, though, in almost all cases, an inefficient regex can be optimized to be an efficient regex, potentially resulting in enormous savings in terms of CPU cycles.

There’s a significant performance difference, so if you work frequently with regular expressions, check this out.

Related Posts

Capturing Implicit Conversions With Extended Events

Grant Fritchey shows how easy it is to build an extended event which captures implicit conversions: Built right into the Extended Events is an event that captures conversions that would affect execution plans, plan_affecting_convert. This event will show both CONVERT and CONVERT_IMPLICIT warnings that you would normally only see within an execution plan. You can […]

Read More

Table Variable Deferred Compilation: When It Works

Milos Radivojevic gives us a good example of when table variable deferred compilation is a good thing: As mentioned in the previous article, SQL Server 2019 cardinality estimations for a table variable are based on actual table variable row counts. Therefore, in SQL Server 2019, we should expect better estimations and better plans for queries […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930