Press "Enter" to skip to content

Category: Query Tuning

Estimated I/O Costs and Spills

Hugo Kornelis goes deep into the rabbit hole:

Typically, when I look at an operator that does I/O, I expected to see values in both the Estimated CPU Cost and the Estimated I/O Cost. An Index Seek locates specific data in an index. The CPU has to do some work, but most of the cost of this operator is the actual I/O. So when I look at an execution plan and I see an Index Seek that has an Estimated CPU Cost of  0.0001787 and an Estimated I/O Cost of 0.003125 (almost 20 times as much), I am not surprise. That’s an expected ratio.

Conversely, operators that don’t do any I/O should of course be estimated to have a zero I/O cost. And that is indeed the case. Operators such as Nested LoopsAssert, or Row Count Spool will always have an Estimated I/O Cost of zero.

And then there are some operators where the Estimated I/O Cost may or may not be zero.

Hugo then walks us through a case where the optimizer is promising you a bad time if you run the query.

Comments closed

The Row Count Spool Operator

Hugo Kornelis dives into another operator:

The Row Count Spool operator is one of the four spool operators that SQL Server supports. It counts the number of rows in its input data, and can then later return that same amount of rows, without having to call its child operators to produce the input again.

The Row Count Spool can be viewed as similar to Table Spool, but optimized for cases where the amount of rows is relevant but their content is not. Because the content of the rows is not relevant, the operator does not need to use tempdb to store its input in a worktable; it only has to keep a running count as it reads the input. The other two spool operators have different use cases: Index Spool is used to enable the spool operator to return specific subsets of the input multiple times, and the Window Spool operator is used to support the ROWS and RANGE specifications of windowing functions.

Read on to see where this might be useful and when it may appear.

Comments closed

Speeding Up sp_helpdb

Dave Bland speeds up sp_helpdb:

When I run this on my computer, it usually takes between 500 and 1000 MS. More on this later.

Now let’s take a look what is happening behind the scenes with sp_helpdb. The first step is to populate a temporary table, #spdbdesc with  database name, owner, when it was created and compatibility level.  The code for the first step is below.

Watch Dave speed this up a bit.

Comments closed

A Trillion-Row Operator

Joe Obbish sets up a challenge:

48 billion rows for a single operator is certainly a large number for most workloads. I ended up completely missing the point and started wondering how quickly a query could process a trillion rows through a single operator. Eventually that led to a SQL performance challenge: what is the fastest that you can get an actual plan with at least one operator processing a trillion rows? The following rules are in play:

1. Start with no user databases
2. Any query can run up to MAXDOP 8
3. Temp tables may be created and populated but all such work needs to finish in 30 seconds or less
4. A RECOMPILE query hint must be present in the final query
5. Undocumented features and behavior are fair game

Read on to see what Joe learned.

Comments closed

TOP and Ordering

Erik Darling is in the middle of a back-to-basics series on performance tuning:

And you see, once you set up a query to return the TOP N rows, there’s an expectation that users get to choose the order they start seeing rows in. As long as we stick to columns whose ordering is supported by an index, things will be pretty stable.

Once we go outside that, a TOP can be rough on a query.

Read on for an example of what happens when that type of thing goes wrong.

Comments closed

When Date Tables Go Bad

Brent Ozar walks through a scenario in which a calendar table (AKA, date dimension) makes a query perform quite a bit worse:

So why did the date table not perform as well as the old-school way?

SQL Server doesn’t understand the relationship between these two tables. It simply doesn’t know that all of the rows in the Users table will match up with rows in the calendar table. It assumes that the calendar table is doing some kind of filtering, especially given our CAST on the date. It doesn’t expect all of the rows to match.

My reaction was pretty much the same as Koen Verbeeck’s in the comments. Put in clearer terms, calendar tables work best when you’re joining a DATE type to a DATE type. Once you introduce times into the mix, the optimizer has to behave differently, not least because you have to do things like CAST() to coerce data types.

Comments closed

Benefits from Nonclustered Columnstore Indexes

Dave Mason shows off some places where non-clustered columnstore indexes can benefit you:

I tend to work mostly with OLTP environments. Many of them have questionable designs or serve reporting workloads. Not surprisingly, there are a lot of performance-sapping table scans and index scans. I’ve compensated for this somewhat by using row and page compression, which became available on all editions of SQL Server starting with SQL Server 2016 SP1. Could I get even better results with columnstore indexes? Lets look at one example.

Here are four individual query statements from a stored procedure used to get data for a dashboard. If you add up percentages for Estimated Cost (CPU + IO), Estimated CPU Cost, or Estimated IO Cost, you get a total of about 90% (give or take a few percent).

Read on for the queries and to see how adding a non-clustered columnstore index helped in Dave’s case. I haven’t had a great deal of success with non-clustered columnstore indexes, but have greatly enjoyed the use of clustered columnstore indexes for fact tables.

Comments closed

Aggregate Splitting in SQL Server 2019

Paul White takes us through a new trick the optimizer has learned:

The extended event query_optimizer_batch_mode_agg_split is provided to track when this new optimization is considered. The description of this event is:

Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.

Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.

Read on as Paul fills that gap.

Comments closed

Halloween Problem and Inserts

Jared Poche continues a dive into the Halloween Problem:

I would have expected us to scan the temp table, then have a LEFT JOIN to the base table. The Table Spool is the red flag that we have an issue with the plan, and is frequently seen with Halloween protections.

The index scan on the base table seems to be overkill since we’re joining on the primary key columns (the key lookup isn’t much of a concern). But we’re likely doing the scan because of the spool; it’s SQL Server’s way of getting all relevant records in one place at one time, breaking the normal flow of row mode operation, to make sure we don’t look up the same record multiple times.

Read on to see the execution plan as well as Jared’s fix.

Comments closed

Seeks are Better than Scans, Except when they Aren’t

Hugo Kornelis explains that both seeks and scans exist for a good reason:

Fact: You should never blindly trust anything you find on the internet. And right now, you are reading the internet. So why should you trust this?
You shouldn’t. At least, not blindly. You should verify. And what better way to verify then through demos!

There went my strategy of blindly trusting Hugo.

The rule of thumb I have heard and go by is, if you’re retrieving less than 1/2 of 1% of data, a seek is the best route. If you’re returning more than 20% of data, a scan is the best route. In between is the “it depends” zone, where either could potentially be better. But please do read Hugo’s post—it’s an important one for query tuners.

Comments closed