Press "Enter" to skip to content

Category: Query Tuning

The Importance of Cardinality

Bert Wagner shows us why cardinality is important to understand when indexing data:

When building indexes for your queries, the order of your index key columns matters.  SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.

But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?

One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.

Comments closed

Disentangling Nested Functions

Brent Ozar takes us through a realistic but nasty scenario:

When I do performance tuning for clients, I really pride myself on making as few changes as possible in order to make a tremendous difference. I consider it a failure when I have to tell someone to rewrite a bunch of queries from scratch.
However, there are some cases where I just can’t work around a perfect storm of anti-patterns. To understand why, let’s take an example. I’ve kept the general idea the same, but I’ve rewritten the entire example in the Stack Overflow database to protect the innocent.

I’ve seen cases similar to what Brent has. Developers understand encapsulation and minimizing code repetition, so they naturally want to do that with SQL, but the optimizer eventually gives up and picks a terrible plan. DRY is great for application code and normalization, but unfortunately, it’s not always great for T-SQL.

Comments closed

Impossible Joins And Terrible Execution Plans

Erik Darling shows us a case where the same query can be nice and fast, but change one parameter and suddenly performance goes out the window:

In the original plan, the TOP asked for rows, and quickly got them.
In the second plan, the TOP kept asking for rows, getting them from the Votes table, and then losing them on the join to Posts.
There was no parameter sniffing, there were no out of date stats, no blocking, or any other oddities. It’s just plain bad luck because of the data’s relationship.

Read the whole thing.

Comments closed

Calculating Median In SQL Server 2019

Tomaz Kastrun shows that batch aggregation mode on window functions allow PERCENTILE_CONT finally to become useful:

Next query, for median calculation was a window function query.


SELECT DISTINCT

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY c1)

OVER (PARTITION BY (SELECT 1)) AS MedianCont

FROM t1


To my surprise, the performance was even worse, and at this time, I have to say, I was running this on SQL Server 2017 with CU7. But luckily, I had a SQL Server 2019 CTP 2.0 also installed and here, with no further optimization the query ran little over 1 second.

I’ve warned people away from this function for all but tiny data sets because of how poorly it performs. With SQL Server 2019, I might be able to recommend it.

1 Comment

Visualizing Hash Match Joins

Bert Wagner continues his series on visualizing physical join operators:

Hash Match joins are the dependable workhorses of physical join operators.
While Nested Loops joins will fail if the data is too large to fit into memory, and Merge Joins require that the input data are sorted, a Hash Match will join any two data inputs you throw at it (as long as the join has an equality predicate and you have enough space in tempdb).  

Bert has some great animated GIFs too.

Comments closed

Deep Dive On The Adaptive Join Operator

Hugo Kornelis covers the Adaptive Join operator:

The Adaptive Join operator was added in SQL Server 2017 as an alternative to the other join operators: Nested Loops (ideal for joining a small data stream with a cheap input), Hash Match (most effective for joining large unsorted sets) and Merge Join (ideal for joining data streams that are sorted by the join key). It is intended to be used when there is no efficient way to fulfill the order requirement of the Merge Join, and the optimizer cannot reliably predict which of the remaining algorithms (Hash Match or Nested Loops) would perform best.
Because it has to be able to join the data using either the Nested Loops or the Hash Match algorithm, Adaptive Join suffers from the combined restrictions of these operators. As such, Adaptive Join supports only four logical join operations: inner join, left outer join (but not the probed version), left semi join, and left anti semi join;  it requires at least one equality-based join predicate, it uses lots of memory, and it is semi-blocking.

Read on for a detailed look at this operator.

Comments closed

Query Tuning In CosmosDB

Hasan Savran explains how we can tune queries in CosmosDB:

This is most common question in my talks about Cosmos DB from DBAs. Cosmos DB is a managed database, this does not mean that you cannot tune up your queries. But the way you tune up the queries is nothing like SQL Server.

       First you need to be sure that you configured your Cosmos DB containers right. What do I mean with that? You should pick the right partition key before you start to tune up any of your queries. Tuning up your queries is not going to help you in long run if you selected a wrong partition key when you created Cosmos DB containers. Throughput value is another value you need to worry about, the good news about the throughput is, you can change it if you need to. You cannot change your partition key!

It’s a whole different world over there.

Comments closed

The Bitmap Operator

Hugo Kornelis describes a new operator:

The Bitmap operator is used to build a bitmap that, based on a hash, represents which values may be present in a data flow. Due to the chance of hash collisions in the hash function used, the Bitmap process can produce false positives but not false negatives – so a match based on a bitmap is not guaranteed to be a match to the actual data, but a non-match based on a bitmap is guaranteed to not be a match in the actual data.
The generated bitmap is typically used in other operators to remove rows for which there is no match in the bitmap, and hence guaranteed no match in the original set of data processed by the Bitmap operator. The use of Bitmap operators is most common in execution plans for star join queries in large data warehouses. An example can be seen here.

Click through for details on how it works and plenty of good information on it.

Comments closed

What Happens With Multiple Missing Indexes

Arthur Daniels shows us what happens when there are multiple missing indexes in an execution plan:

This is missing index request #1, and by default, this is the only missing index we’ll see by looking at the graphical execution plan. There’s actually a missing index request #2, which we can find in the XML (I know, it’s a little ugly to read. Bear with me).

I am of two minds on this. It probably should be easier to see multiple index candidates, but there’s already so much risk of people just copy-pastaing missing index recommendations that adding more seems like a bad idea.

Comments closed