Press "Enter" to skip to content

Category: Query Tuning

Hash Join Performance

Paul White wrote something.  That’s good enough to get tagged here:

It might be tempting to ask why the optimizer does not routinely add null-rejecting filters prior to equality joins. One can only suppose that this would not be beneficial in enough common cases. Most joins are not expected to encounter many null = null rejections, and adding predicates routinely could quickly become counter-productive, particularly if many join columns are present. For most joins, rejecting nulls inside the join operator is probably a better option (from a cost model perspective) than introducing an explicit Filter.

It does seem that there is an effort to prevent the very worst cases from manifesting through the post-optimization rewrite designed to reject null join rows before they reach the build input of a hash join. It seems that an unfortunate interaction exists between the effect of optimized bitmap filters and the application of this rewrite. It is also unfortunate that when this performance problem does occur, it is very difficult to diagnose from the execution plan alone.

I’m only on reading #2 of the post, so I’ve got a few more things yet to learn from it.

Comments closed

Grouping Clauses

William Brewer goes into nice detail on the grouping clauses ROLLUP, CUBE, and GROUPING SETS.

ROLLUP and CUBE had their heyday before SSAS. They were useful for providing the same sort of facilities offered by the cube in OLAP. It still has its uses though. In AdventureWorks, it is overkill, but if you are handling large volumes of data you need to pass over your data only once, and do as much as possible on data that has been aggregated. Events that happened in the past can’t be changed, so it is seldom necessary to retain historic data on an active OLTP system. Instead, you only need to retain the aggregated data at the level of detail (‘granularity’) required for all foreseeable reports.

Imagine you are responsible for reporting on a telephone switch that has two million or so calls a day. If you retain all these calls on your OLTP server, you are soon going to find the SQL Server labouring over usage reports. You have to retain the original call information for a statutory time period, but you determine from the business that they are, at most, only interested in the number of calls in a minute. Then you have reduced your storage requirement on the OLTP server to 1.4% of what it was, and the call records can be archived off to another SQL Server for ad-hoc queries and customer statements. That’s likely to be a saving worth making. The CUBE and ROLLUP clauses allow you to even store the row totals, column totals and grand totals without having to do a table, or clustered index, scan of the summary table.

As long as changes aren’t made retrospectively to this data, and all time periods are complete, you never have to repeat or alter the aggregations based on past time-periods, though grand totals will need to be over-written!

I’ve used ROLLUP and GROUPING SETS fairly regularly but not so much CUBE.  Read the whole thing and figure out that the aggregation & summarization monster you have to maintain can maybe be re-written in a much simpler way.

Via Database Weekly.

Comments closed

Session Wait Stats

SQL Server 2016 has a per-session wait stats DMV:  sys.dm_exec_session_wait_stats.  That’s exciting; wait stats are extremely interesting, but until now, impossible to use on a per-item level in a busy production system (where you’d most want to use them).

Daniel Farina looks at how the new DMV relates to sys.dm_exec_wait_stats (via Database Weekly):

[R]esetting the data of sys.dm_os_wait_stats operating system view doesn’t affect the values of sys.dm_exec_session_wait_stats view.

Based on my MSDN reading, the sys.dm_exec_session_wait_stats DMV resets if the connection pool context is re-used or if the session closes.  This is why DBCC SQLPERF doesn’t include a reset option for session-specific wait stats.

Comments closed