Cardinality Estimation And String Splits

Dan Holmes points out a quirk of estimated row counts with CLR-based functions:

That is an enormous amount of data.  What if you needed to sort that?  What if you joined this to another table or view and a spool was required.  What it it was a hash join and a memory grant was required?  The demand that this seemingly innocuous statement placed on your server could be overwhelming.

The memory grant could create system variability that is very difficult to find.  There is a thread on MSDN that I started which exposes what prompted this post.  (The plan that was causing much of the problem is at this link.)

It’s important to keep in mind the good enough “big round figures” that SQL Server uses for row estimation when stats are unavailable (e.g., linked server to Hive or a CLR function like in the post).  These estimates aren’t always correct, and there are edge cases like the one in the post in which the estimates will be radically wrong and begin to affect your server.

Related Posts

Character Columns And MAX Vs TOP+ORDER Differences

Kendra Little digs into a tricky performance problem: Most of the time in SQL Server, the MAX() function and a TOP(1) ORDER BY DESC will behave very similarly. If you give them a rowstore index leading on the column in question, they’re generally smart enough to go to the correct end of the index, and […]

Read More

Row Goals And Anti-Joins

Paul White continues his row goals series: The optimizer assumes that people write a semi join (indirectly e.g. using EXISTS) with the expectation that the row being searched for will be found. An apply semi join row goal is set by the optimizer to help find that expected matching row quickly. For anti join (expressed e.g. using NOT EXISTS) the optimizer’s assumption is that […]

Read More


February 2016
« Jan Mar »