Cardinality Estimation And String Splits

Dan Holmes points out a quirk of estimated row counts with CLR-based functions:

That is an enormous amount of data.  What if you needed to sort that?  What if you joined this to another table or view and a spool was required.  What it it was a hash join and a memory grant was required?  The demand that this seemingly innocuous statement placed on your server could be overwhelming.

The memory grant could create system variability that is very difficult to find.  There is a thread on MSDN that I started which exposes what prompted this post.  (The plan that was causing much of the problem is at this link.)

It’s important to keep in mind the good enough “big round figures” that SQL Server uses for row estimation when stats are unavailable (e.g., linked server to Hive or a CLR function like in the post).  These estimates aren’t always correct, and there are edge cases like the one in the post in which the estimates will be radically wrong and begin to affect your server.

Related Posts

Join Elimination

Lukas Eder has a nice post explaining different forms of automatic join elimination: We intended to fetch all customers and their addresses. But observe: We project only columns from the CUSTOMER table and we don’t have any predicates at all, specifically not predicates using the ADDRESS table. So, we’re completely ignoring any contributions from the ADDRESS table. We never really needed […]

Read More

The Pain Of Multi-Statement TVFs

Andy Mallon walks through a multi-statement table-valued function in Microsoft Dynamics CRM: Look at all those table-valued function calls! Followed immediately by a really expensive hash match. My Spidey Sense started to tingle. What is fn_GetMaxPrivilegeDepthMask, and why is it being called 30 times? I bet this is a problem. When you see “Table-valued function” as an operator […]

Read More


February 2016
« Jan Mar »