Press "Enter" to skip to content

Category: Query Tuning

Execution Plan Operator Timings

Paul White dives into a murky problem:

SQL Server 2014 SP2 and later produce runtime (“actual”) execution plans that can include elapsed time and CPU usage for each execution plan operator (see KB3170113 and this blog post by Pedro Lopes).

Interpreting these numbers is not always as straightforward as one might expect. There are important differences between row mode and batch mode execution, as well as tricky issues with row mode parallelism. SQL Server makes some timing adjustments in parallel plans to promote consistency, but they are not perfectly implemented. This can make it difficult to draw sound performance-tuning conclusions.

This article aims to help you understand where the timings come from in each case, and how they can be best interpreted in context.

My default advice is to read what Paul White writes, and this is no exception.

Comments closed

Batch Mode on Row Store in SQL Server 2019

Deepthi Goguri looks at a nice performance improvement in SQL Server 2019:

In the previous post, we learned about Table variable deferred compilation. In this blog, lets focus on the batch mode on rowstore feature introduced in SQL Server 2019. This feature improves the performance of the analytical queries using the batch mode query processing. This feature is for CPU optimization helping analytical queries to run faster. We do not have to specify this option if the database compatibility is 150.

This feature is especially for the analytical queries for CPU bound analytic workloads without needing the columnstore indexes. We can specifically mention the hints in the query for using the batch mode or not.

There are specific rules which must be met before it kicks in, but the performance benefit can be significant. If you’re running SQL Server 2017, you needed a columnstore index on a table to get batch mode, though there is a trick around this: you can create a filtered, nonclustered columnstore index WHERE 1=0 so that it doesn’t have any rows. Then, any queries which hit that table are potentially eligible for batch mode processing, even though none of them use the columnstore index.

Comments closed

Table Variable Deferred Compilation

Deepthi Goguri takes us through a fairly new feature in SQL Server:

With SQL Server 2017 and below versions, SQL Server always estimates one row for table variable as the table variable data gets inserted during the run time, so optimizer doesn’t know how many values it can expect coming out of the table variable. Due to this bad estimation, performance of the queries is effected.

Click through to see how this has changed in SQL Server 2019.

Comments closed

Temporal Tables and Execution Plans

Hugo Kornelis starts a new sub-series within an existing series:

Welcome to part sixteen of the plansplaining series. The first of a few posts about how temporal tables affect execution plans. In this post, I’ll build on the last four posts on data modifications, building on the more generic discussion of data modification in the previous four posts. Later posts will look at data retrieval and some specific scenarios.

Hugo hits the highlights of temporal tables and how they handle insertion, deletion, and updating scenarios.

Comments closed

Disorderly Queries

Chad Callihan wants you to think about that ORDER BY clause:

I recently came across a scenario where an application process was not performing correctly on one database but was working fine on others. The process should have been completing in seconds but was taking minutes with no indication of activity. After some investigation, I found that the process was stuck waiting on a SELECT statement to complete. Even worse, it was holding an exclusive lock on a table which was then blocking new information from processing.

One part of the SELECT query that stood out was that it was ordering by a date field. Considering what the process was doing, there was no need to have the data ordered. Generally, it’s better to have the data sorted on the application side instead of SQL Server but in this case not even that was necessary.

There are definitely good cases where you need to use ORDER BY in a database—such as paging scenarios. But if you don’t need things in a particular order, Chad shows that you can potentially save a good deal on performance without an explicit ordering.

Comments closed

Multi-Pathed Queries

Guy Glanster needs a multi-tool procedure:

This stored procedure, which I created in the AdventureWorks2017 database, has two parameters: @CustomerID and @SortOrder. The first parameter, @CustomerID, affects the rows to be returned. If a specific customer ID is passed to the stored procedure, then it returns all the orders (top 10) for this customer. Otherwise, if it’s NULL, then the stored procedure returns all orders (top 10), regardless of the customer. The second parameter, @SortOrder, determines how the data will be sorted—by OrderDate or by SalesOrderID. Notice that only the first 10 rows will be returned according to the sort order.

So, users can affect the behavior of the query in two ways—which rows to return and how to sort them. To be more precise, there are 4 different behaviors for this query:

1. Return the top 10 rows for all customers sorted by OrderDate (the default behavior)
2. Return the top 10 rows for a specific customer sorted by OrderDate
3. Return the top 10 rows for all customers sorted by SalesOrderID
4. Return the top 10 rows for a specific customer sorted by SalesOrderID

Let’s test the stored procedure with all 4 options and examine the execution plan and the statistics IO.

This is quite common for reporting procedures and Guy shares several patterns, some of which work better than others.

Comments closed

So You Want to Index

Erik Darling has an indexing strategy for querulous normies:

Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.

But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.

None of this is groundbreaking but Erik does a really good job of laying out the order in which you want to consider specific factors.

Comments closed

Searching Text which Begins with a Wildcard

Chad Callihan is looking for some data:

Searching for a value or group of values with a wildcard is more than just putting a % on both sides of a text string. If you know you’re looking for all strings in a name field that start with the name “Chad” then you are you really shooting yourself in the foot by using ‘%Chad%’ instead of ‘Chad%’ in your query. SQL Server is going to be scanning the table instead of being able to use an index to seek to the data. While that may work to get your result, it’s likely going to take longer and be more invasive than needed. I want to go through an example of how using a reversed column can improve SQL Server’s ability to build a better execution plan.

Read on to see how a computed reversal column can help. For more complex scenarios, an n-gram table (for example, a trigram) might help, though there’s a lot of setup involved there.

Comments closed

Join Algorithm Selection in Spark

The Hadoop in Real World team takes us through the selection criteria for join types:

There are several factors Spark takes into account before deciding on the type of join algorithm to use to join datasets at runtime.

Spark has the following 5 algorithms to choose from –

1. Broadcast Hash Join
2. Shuffle Hash Join
3. Shuffle Sort Merge Join
4. Broadcast Nested Loop Join
5. Cartesian Product Join (a.k.a Shuffle-and-Replicate Nested Loop Join)

Read on to learn which join types are supported in which circumstances, as well as rules of precedence.

Comments closed

Sorting Pre-Sorted Data

Daniel Hutmacher has an idea:

Whenever SQL Server needs to sort a data stream, it will use the Sort operator to reorder the rows of the stream. Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times. The only time SQL Server doesn’t sort the data is when it already knows the data to be ordered correctly, like when it has already passed a Sort operator or it’s reading from an appropriately sorted index.

But what happens if the data is ordered correctly, but SQL Server doesn’t know about it? Let’s find out.

Click through for the answer.

Comments closed