Press "Enter" to skip to content

Category: Query Tuning

Exchange Demand Partitioning and Parallel Queries

Joe Obbish takes us through a fun concurrency problem:

Very little has been written about exchange operators with a partitioning type of demand, so I forgive you for not hearing of it before today. There is a brief explanation available here, an example of using demand partitioning to improve some query plans involving partitioned tables, and a Stack Exchange answer for someone comparing round robin and demand partitioning. You have the honor of reading perhaps the fourth blog post about the subject.

Read on for an in-depth look at the problem.

Comments closed

Misleading Query Plan Estimates from Missing Nodes

Hugo Kornelis points out another issue when checking arrow width when reading execution plans:

The issue I will talk about today is caused when a node is missing, due to a post-optimization rewrite. I myself have so far only seen this in execution plans where a bitmap filter (created by a Bitmap operator, by a Batch Hash Table Build operator, or by a Hash Match operator with a BitmapCreator property) was pushed into an Index ScanClustered Index Scan, or Columnstore Index Scan operator. So that’s what I’ll focus on here. But do keep in mind that there may be other situations where a post-optimization rewrite removes a node.

To understand the root cause of this (and, credit where credit is due, I only understood this myself after getting it explained by Paul White, whose explanation I am now paraphrasing here), we need to dig deeper into the internals of the optimizer.

This is where you put on the internals mining cap and watch out for falling rocks.

Comments closed

Rows Read Versus Rows Returned

Hugo Kornelis explains another issue with arrow widths in tools like SQL Server Management Studio:

The visual of the arrows in an execution plan strongly suggests that they represent the flow of rows from one operator to another. And hence, the width of that arrow strongly appears to be an indication of how many rows are passed between the connected operators. That used to be always the case. But unfortunately, this changed in December 2017 (or earlier, but that was when I first noticed the change).

The root cause is the addition of a new property, returned by Scan and Seek operators: Number of Rows Read (as well as its estimated counterpart: Estimated Number of Rows to be Read). Now don’t get me wrong, those properties are awesome and I’m super happy that they were added to execution plans. When a filter condition is pushed into a scan or seek operator as a Predicate property, I want to be able to see how effective it is, and comparing Actual Number of Rows to Number of Rows Read (or Estimated Number of Rows to Estimated Number of Rows to be Read) helps me assess just that. Which in turn can help me decide whether I should change my indexing, or try to rewrite the query.  But I digress.

Click through to learn what the issue is. Hugo describes a tricky situation where there are two valuable measures but only one way to show them. If you agree with Hugo’s preferences, here’s a Feedback item for you.

Comments closed

When Arrow Widths Mislead

Hugo Kornelis provides one example of misleading arrow widths in execution plans:

In my previous post, I talked about the Actual Number of Rows and Estimated Number of Rows properties, their visual representation in execution plans, and the most important ways in which this can be used.

But life is not always perfect, and Microsoft likes to remind us of that. Sometimes, these properties report values in weird and confusing ways. Sometimes the arrow width, as the visual representation in the execution plan, misleads us.

Let’s look at one such case.

Read the whole thing and vote for Hugo’s Feedback item to improve this experience.

Comments closed

Row Counts and Arrow Widths in SSMS

Hugo Kornelis takes us through one of the niceties SQL Server Management Studio (and subsequent tools like SentryOne Plan Explorer and Azure Data Studio) has for reading execution plans:

Because both the Actual Number of Rows and the Estimated Number of Rows are very valuable for many use cases, Microsoft has built their primary tool for viewing execution plans (SQL Server Management Studio) to highlight the value of at least one of these two properties, in a visual way, when you look at the graphical representation of an execution plan.

Look at the arrows that show which operator (is called by and) returns rows to which operator, and that therefore can be thought of as representing the data stream. In most execution plans, they are not all the same width. Management Studio uses the width of the arrow as a quick, simple visual representation of the value of Actual Number of Rows (in an execution plan plus run-time statistics, aka “execution plan plus” or “actual execution plan”), or of the value of the Estimated Number of Rows in an execution plan only (aka “estimated execution plan”).

This first post is the happy path post, where the information is correct and useful. Later posts promise to cover cases in which this does not hold.

Comments closed

Issues with Window Functions and Views / Derived Tables

Paul White takes us through some of the difficulties you are liable to see when using window functions in views and derived tables:

Our expectation is that the execution plan for this new query will be exactly the same as before we created the view. The query optimizer should be able to push the product filter specified in the WHERE clause down into the view, resulting in an index seek.

We need to stop and think a bit at this point, however. The query optimizer can only produce execution plans that are guaranteed to produce the same results as the logical query specification. Is it safe to push our WHERE clause into the view?

Read the whole thing.

Comments closed

The Halloween Problem, Continued

Paul White continues a series on the Halloween Problem. Part 2 looks at insert and delete statements:

In the first part of this series, we saw how the Halloween Problem applies to UPDATE queries. To recap briefly, the problem was that an index used to locate records to update had its keys modified by the update operation itself (another good reason to use included columns in an index rather than extending the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and writing sides of the execution plan to avoid the problem. In this post, we will see how the same underlying issue can affect INSERT and DELETE statements.

Part 3 looks at the train wreck MERGE operator:

In the right circumstances, the SQL Server optimizer can recognize that the MERGE statement is hole-filling, which is just another way of saying that the statement only adds rows where there is an existing gap in the target table’s key.

For this optimization to be applied, the values used in the WHEN NOT MATCHED BY TARGET clause must exactly match the ON part of the USING clause. Also, the target table must have a unique key (a requirement satisfied by the PRIMARY KEY in the present case).

Where these requirements are met, the MERGE statement does not require protection from the Halloween Problem.

If only it weren’t busted in so many other ways!

Part 4 wraps up the series:

The SQL Server optimizer has specific features that allow it to reason about the level of Halloween Protection (HP) required at each point in the query plan, and the detailed effect each operator has. These extra features are incorporated into the same property framework the optimizer uses to keep track of hundreds of other important bits of information during its search activities.

Each operator has a required HP property and a delivered HP property. The required property indicates the level of HP needed at that point in the tree for correct results. The delivered property reflects the HP provided by the current operator and the cumulative HP effects provided by its subtree.

This last one goes into some nice detail.

Comments closed

Reading Query Plans in Spark

Daniel Ciocirlan has a primer on query plans in Apache Spark:

Let’s go over some examples of query plans and how to read them. Let’s go back to the one we’ve just shown:

 == Physical Plan == *(1) Project [(id#0L * 5) AS id#2L]

+- *(1) Range (1, 1000000, step=1, splits=6)

We read this plan backwards, bottom to top:

Spark does have some UI components which make this a bit easier, but you’ll probably end up in a situation where you need to read it in this format.

Comments closed

I Remember Halloween

Paul White talks about the Halloween Problem:

Much has been written over the years about understanding and optimizing SELECT queries, but rather less about data modification. This series looks at an issue that is specific to INSERTUPDATEDELETE and MERGE queries – the Halloween Problem.

The phrase “Halloween Problem” was originally coined with reference to a SQL UPDATE query that was supposed to give a 10% raise to every employee who earned less than $25,000. The problem was that the query kept giving 10% raises until everyone earned at least $25,000.

We will see later on in this series that the underlying issue also applies to INSERTDELETE and MERGE queries, but for this first entry, it will be helpful to examine the UPDATE problem in a bit of detail.

This is a classic problem in data management and has led to a good bit of confusion over time about why database updates can perform worse than you’d expect.

Comments closed

When a Non-Clustered Index on Clustered Columns Makes Sense

Allen White gives us a scenario where adding a non-clustered index which is the same column as the clustered index can make sense:

Recently I was asked about adding a non-clustered index to a table (let’s call it Images) with just one column. It had been added in the development database and it improved performance dramatically. I looked at it and it had the same key as the clustered index on that table.

In reviewing the query I saw that Images was joined to the other tables in the query, but none of the columns were used, so Images was joined to ensure that values from the other tables had rows in Images. The query plan shows a significantly higher number of reads against Images without the new NCI (non-clustered index) than when it’s present.

I do agree that this can help—as we obviously see. The backseat query tuner in me wonders if maybe there’s another way to write the query to prevent the scan by using CROSS APPLY, but that’d only help if they were getting a small percentage of rows from the parent table expression built from the combination of the clustered index scan and index seek in the second example.

Comments closed