Hash Join Performance

Paul White wrote something.  That’s good enough to get tagged here:

It might be tempting to ask why the optimizer does not routinely add null-rejecting filters prior to equality joins. One can only suppose that this would not be beneficial in enough common cases. Most joins are not expected to encounter many null = null rejections, and adding predicates routinely could quickly become counter-productive, particularly if many join columns are present. For most joins, rejecting nulls inside the join operator is probably a better option (from a cost model perspective) than introducing an explicit Filter.

It does seem that there is an effort to prevent the very worst cases from manifesting through the post-optimization rewrite designed to reject null join rows before they reach the build input of a hash join. It seems that an unfortunate interaction exists between the effect of optimized bitmap filters and the application of this rewrite. It is also unfortunate that when this performance problem does occur, it is very difficult to diagnose from the execution plan alone.

I’m only on reading #2 of the post, so I’ve got a few more things yet to learn from it.

Get Started With U-SQL

Microsoft is pushing U-SQL pretty hard.  Here’s a tutorial by Jonathan Gao to whet your appetite:

U-SQL is a language that unifies the benefits of SQL with the expressive power of your own code to process all data at any scale. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across relational stores such as Azure SQL Database. It enables you to process unstructured data by applying schema on read, insert custom logic and UDF’s, and includes extensibility to enable fine grained control over how to execute at scale. To learn more about the design philosophy behind U-SQL, please refer to this Visual Studio blog post.

You do need Data Lake Tools for Visual Studio, but it looks like you can run it locally.

The VS blog had something a month ago on the topic.  I’m not saying get on it…yet…


Kevin Feasel



It’s syntactic sugar, but oh, is it tasty.  From Manoj Pandey:

To make sure I checked the MSDN BOL and found that this is a new feature added to the SQL Server 2016 version. And as per this msdn article this enhancement has been add with theCTP 3 release.

For IF EXISTS option/syntax the MSDN BoL mentions: Conditionally drops the [object] only if it already exists.
This is a very helpful enhancement added to these DDL statements and would reduce a lot of effort and coding lines.

Create IF NOT EXISTS is also a thing.  A glorious thing.


November 2015
« Jan Dec »