Cardinality Estimation And String Splits

Dan Holmes points out a quirk of estimated row counts with CLR-based functions:

That is an enormous amount of data.  What if you needed to sort that?  What if you joined this to another table or view and a spool was required.  What it it was a hash join and a memory grant was required?  The demand that this seemingly innocuous statement placed on your server could be overwhelming.

The memory grant could create system variability that is very difficult to find.  There is a thread on MSDN that I started which exposes what prompted this post.  (The plan that was causing much of the problem is at this link.)

It’s important to keep in mind the good enough “big round figures” that SQL Server uses for row estimation when stats are unavailable (e.g., linked server to Hive or a CLR function like in the post).  These estimates aren’t always correct, and there are edge cases like the one in the post in which the estimates will be radically wrong and begin to affect your server.

Related Posts

Table Variables And Parallelism

Erik Darling shows your brain on table variables: Inserts and other modifications to table variables can’t be parallelized. This is a product limitation, and the XML warns us about it. The select could go parallel if the cardinality estimate were more accurate. This could potentially be addressed with a recompile hint, or with Trace Flag […]

Read More

Non-Blocking Aggregations

Daniel Hutmacher tilts at windmills: It’s not entirely uncommon to want to group by a computed expression in an aggregation query. The trouble is, whenever you group by a computed expression, SQL Server considers the ordering of the data to be lost, and this will turn your buttery-smooth Stream Aggregate operation into a Hash Match […]

Read More

Categories

February 2016
MTWTFSS
« Jan Mar »
1234567
891011121314
15161718192021
22232425262728
29