Press "Enter" to skip to content

Category: Query Tuning

PFS Contention and Heaps

Uwe Ricken continues a series on heaps in SQL Server:

The PFS page “can” become a bottleneck for a heap if many data records are entered in the heap in the shortest possible time. How often the PFS page has to be updated depends mostly on the data record’s size to be saved.

This procedure does not apply to clustered indexes since data records in an index must ALWAYS be “sorted” into the data volume according to the defined index value. Therefore, the search for a “free” space is not carried out via the PFS page but via the value of the key attribute!

Read on for more detail.

Comments closed

MAX Type Variables in WHERE Clauses and Recompile

Erik Darling puts on his lab coat and goggles:

After blogging recently (maybe?) about filters, there was a Stack Exchange question about a performance issue when a variable was declared with a max type.

After looking at it for a minute, I realized that I had never actually checked to see if a recompile hint would allow the optimizer more freedom when dealing with them.

Read on for Erik’s findings.

Comments closed

The Merge Interval Operator

Hugo Kornelis looks at another execution plan operator:

The Merge Interval operator reads dynamic seek range specifications, checks to see if their specified ranges overlap, and if so combines the overlapping ranges into one new range.

One typical use case is for a query that uses multiple BETWEEN specifications, connected with OR. When these ranges overlap, they must be combined into a single range. This saves performance, but more important is that it prevents rows that satisfy both range specifications from being returned multiple times. When the boundaries of the BETWEEN are given as constants, the optimizer analyzes for overlaps and combines ranges if needed when compiling the query. But when the boundaries of the BETWEEN specifications are only known at run-time (variables, column references), the Merge Interval operator is used for this task.

Click through to see how it works.

Comments closed

Batch Mode with Window Functions and Parallelism

Erik Darling has a two-parter on how using batch mode processing when working with window functions can lead to better performance. Part 1 sets the stage:

If you ask people who tune queries why batch mode is often much more efficient with windowing functions, they’ll tell you about the window aggregate operator.

That’s all well and good, but there’s another, often sneaky limitation of fully row mode execution plans with windowing functions.

Let’s go take a look!

Part 2 identifies the culprit:

When queries go parallel, you want them to be fast. Sometimes they are, and it’s great.

Other times they’re slow, and you end up staring helplessly at a repartition streams operator.

Check out both of these posts.

Comments closed

Batch Mode with Temp Tables

Erik Darling continues receiving big paydays from Big Temp Table:

When you have queries that need to process a lot of data, and probably do some aggregations over that lot-of-data, batch mode is usually the thing you want.

Originally introduced to accompany column store indexes, it works by allowing CPUs to apply instructions to up to 900 rows at a time.

It’s a great thing to have in your corner when you’re tuning queries that do a lot of work, especially if you find yourself dealing with pesky parallel exchanges.

Read on to see how you can create a temp table which triggers batch mode processing fairly easily.

Comments closed

Parallel Inserts into Temp Tables

Erik Darling explains the pre-conditions for parallel insertion into temporary tables:

If you have a workload that uses #temp tables to stage intermediate results, and you probably do because you’re smart, it might be worth taking advantage of being able to insert into the #temp table in parallel.

Remember that you can’t insert into @table variables in parallel, unless you’re extra sneaky. Don’t start.

If your code is already using the SELECT ... INTO #some_table pattern, you’re probably already getting parallel inserts. But if you’re following the INSERT ... SELECT ... pattern, you’re probably not, and, well, that could be holding you back.

There are enough pre-conditions that this becomes a decision rather than an automatic. Especially if you’re dealing with temp tables with indexes and want to take advantage of temp table reuse, which I believe precludes changing the structure of the table (including adding indexes) after creation.

Comments closed

The Performance Cost of AT TIME ZONE

Erik Darling shows that AT TIME ZONE does not scale well when used in filters against columns:

Databases really do make you pay dearly for mistakes, and new linguistic functionality is not implemented with performance in mind.

I’ve written before about how to approach date math in where clauses: Where To Do Date Math In Your Where Clause

And it turns out that this lesson is brutally true if you need to pass time zones around, too.

Read the whole thing. In this respect, AT TIME ZONE is similar to pretty much all other date operators and functions.

Comments closed

Identifying Straggler Tasks in Spark Applications

Ajay Gupta clues us in on a process:

What Is a Straggler in a Spark Application?

A straggler refers to a very very slow executing Task belonging to a particular stage of a Spark application (Every stage in Spark is composed of one or more Tasks, each one computing a single partition out of the total partitions designated for the stage). A straggler Task takes an exceptionally high time for completion as compared to the median or average time taken by other tasks belonging to the same stage. There could be multiple stragglers in a Spark Job being present either in the same stage or across multiple stages. 

Read on to understand the consequences and causes of these straggler tasks, as well as what you can do about them.

Comments closed