Press "Enter" to skip to content

Category: Query Tuning

Finding Indexes Not in Use

Dennes Torres takes us through a few iterations of a query to find indexes not in use:

It doesn’t matter if you are trying to remove indexes for good reasons or just to work around a bad environment, let’s see in more details how to find which indexes doesn’t have enough usage to justify their existance.

First, some basic definitions, without going into many details:

Index Seek: That’s the best and desirable use of the index. It means the index tree is being used to go directly to the records we need.

Index Scan: Not so good as an index seek, so it could be better. However, sometimes even an index scan is good, a non clustered index scan means the pages of that index are smaller an better for a scan than the pages of the clustered index. There are many variations that makes an index scan good, but most times you don’t need to reach this level of analysis, you may reach your objective only analysing index seeks.

Update: When the fields are updated (update/insert/delete) all indexes which contain those fields need to be updated as well. Indexes are a balance: We increase performance on reading and suffer a bit more when writting. The problem is when the writting happens more than the reading.

Read on to see Dennes’s query evolve and bring important information to the table. For example, it’s not just how often a particular index gets used; it’s also how important the queries are which use this index. An index may only run once a month, but if it turns the most important report the CEO cares about from running in 4 hours to running in 4 seconds, you bet that index is staying.

Comments closed

Eager Spools in SQL Server

Erik Darling describes the concept of eager spooling:

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

I enjoy and frequently use Erik’s depiction that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index.

Comments closed

Performance Impacts of Computed Columns

Robert Sheldon takes us through a few scenarios:

In this article, I walk you through the process of applying these strategies so you have a better sense of the available options. For the examples, I created four similar tables and populated them with identical data, which comes from the WideWorldImporters sample database. Each table includes the same computed column, with the column persisted in two tables and indexed in two tables, resulting in the following mix:

– The Orders1 table includes a non-persisted computed column.
– The Orders2 table includes a persisted computed column.
– The Orders3 table includes an indexed, non-persisted computed column.
– The Orders4 table includes an indexed, persisted computed column.

For each table, I show you the execution plan that’s generated when querying the computed column. The column’s expression is a relatively simple one, and the data set very small. Even so, this should be enough to demonstrate the principles of creating persistent and indexed computed columns and how they can help address performance-related issues.

Click through for Robert’s results.

Comments closed

Columnstore Query Patterns

Ed Pollack walks us through some query patterns which do and don’t work very well with columnstore indexes:

Reading data from a highly compressed analytical structure is quite different from the query patterns used on transactional data. By leveraging metadata, data order, segment elimination, and compression, large tables can be quickly read and results returned in seconds (or less!).

Taking this further, read queries against columnstore indexes can be further optimized by simplifying queries and providing the query optimizer with the easiest path to the smallest columnstore scans needed to return results.

This article explores the most efficient ways to read a columnstore index and produce guidelines and best practices for analytics against large columnstore data structures.

Read on for good advice.

Comments closed

Estimated I/O Costs and Spills

Hugo Kornelis goes deep into the rabbit hole:

Typically, when I look at an operator that does I/O, I expected to see values in both the Estimated CPU Cost and the Estimated I/O Cost. An Index Seek locates specific data in an index. The CPU has to do some work, but most of the cost of this operator is the actual I/O. So when I look at an execution plan and I see an Index Seek that has an Estimated CPU Cost of  0.0001787 and an Estimated I/O Cost of 0.003125 (almost 20 times as much), I am not surprise. That’s an expected ratio.

Conversely, operators that don’t do any I/O should of course be estimated to have a zero I/O cost. And that is indeed the case. Operators such as Nested LoopsAssert, or Row Count Spool will always have an Estimated I/O Cost of zero.

And then there are some operators where the Estimated I/O Cost may or may not be zero.

Hugo then walks us through a case where the optimizer is promising you a bad time if you run the query.

Comments closed

The Row Count Spool Operator

Hugo Kornelis dives into another operator:

The Row Count Spool operator is one of the four spool operators that SQL Server supports. It counts the number of rows in its input data, and can then later return that same amount of rows, without having to call its child operators to produce the input again.

The Row Count Spool can be viewed as similar to Table Spool, but optimized for cases where the amount of rows is relevant but their content is not. Because the content of the rows is not relevant, the operator does not need to use tempdb to store its input in a worktable; it only has to keep a running count as it reads the input. The other two spool operators have different use cases: Index Spool is used to enable the spool operator to return specific subsets of the input multiple times, and the Window Spool operator is used to support the ROWS and RANGE specifications of windowing functions.

Read on to see where this might be useful and when it may appear.

Comments closed

Speeding Up sp_helpdb

Dave Bland speeds up sp_helpdb:

When I run this on my computer, it usually takes between 500 and 1000 MS. More on this later.

Now let’s take a look what is happening behind the scenes with sp_helpdb. The first step is to populate a temporary table, #spdbdesc with  database name, owner, when it was created and compatibility level.  The code for the first step is below.

Watch Dave speed this up a bit.

Comments closed

A Trillion-Row Operator

Joe Obbish sets up a challenge:

48 billion rows for a single operator is certainly a large number for most workloads. I ended up completely missing the point and started wondering how quickly a query could process a trillion rows through a single operator. Eventually that led to a SQL performance challenge: what is the fastest that you can get an actual plan with at least one operator processing a trillion rows? The following rules are in play:

1. Start with no user databases
2. Any query can run up to MAXDOP 8
3. Temp tables may be created and populated but all such work needs to finish in 30 seconds or less
4. A RECOMPILE query hint must be present in the final query
5. Undocumented features and behavior are fair game

Read on to see what Joe learned.

Comments closed

TOP and Ordering

Erik Darling is in the middle of a back-to-basics series on performance tuning:

And you see, once you set up a query to return the TOP N rows, there’s an expectation that users get to choose the order they start seeing rows in. As long as we stick to columns whose ordering is supported by an index, things will be pretty stable.

Once we go outside that, a TOP can be rough on a query.

Read on for an example of what happens when that type of thing goes wrong.

Comments closed

When Date Tables Go Bad

Brent Ozar walks through a scenario in which a calendar table (AKA, date dimension) makes a query perform quite a bit worse:

So why did the date table not perform as well as the old-school way?

SQL Server doesn’t understand the relationship between these two tables. It simply doesn’t know that all of the rows in the Users table will match up with rows in the calendar table. It assumes that the calendar table is doing some kind of filtering, especially given our CAST on the date. It doesn’t expect all of the rows to match.

My reaction was pretty much the same as Koen Verbeeck’s in the comments. Put in clearer terms, calendar tables work best when you’re joining a DATE type to a DATE type. Once you introduce times into the mix, the optimizer has to behave differently, not least because you have to do things like CAST() to coerce data types.

Comments closed