Category: Performance Tuning

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  COUNT STOPKEY                |         |       |
|   2 |   VIEW                        |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

-------------------------------------------------
| Id  | Operation                | Name | Rows  |
-------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |
|*  1 |  VIEW                    |      |     1 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000 |
|   3 |    TABLE ACCESS FULL     | FILM |  1000 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Lukas digs into this and is not the biggest fan of OFFSET-FETCH. On the SQL Server side, my anecdotal experience has been that it doesn’t perform nearly as well as you’d like either.

Comments closed

Join Elimination

Published 2018-06-20 by Kevin Feasel

Bert Wagner shows off the concept of join elimination in SQL Server:

SQL Server avoids joining to the Sales.Invoices table because it trusts the referential integrity maintained by the foreign key constraint defined on InvoiceID between Sales.InvoiceLines and Sales.Invoices; if a row exists in Sales.InvoiceLines, a row with the matching value for InvoiceID must exist in Sales.Invoices. And since we are only returning data from the Sales.InvoiceLines table, SQL Server doesn’t need to read any pages from Sales.Invoices at all.

We can verify that SQL Server is using the foreign key constraint to eliminate the join by dropping the constraint and running our query again:

ALTER TABLE [Sales].[InvoiceLines]  
DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];

Erik Darling shows that the optimizer isn’t perfect at this:

Rob Farley has my favorite material on it. There’s an incredible amount of ~~laziness~~ ingenuity built into the optimizer to keep your servers from doing unnecessary work.

That’s why I’d expect a query like this to throw away the join:

1

2

3

4

SELECT COUNT(u.Id)

FROM dbo.Users AS u

JOIN dbo.Users AS u2

ON u.Id = u2.Id;

After all, we’re joining the Users table to itself on the PK/CX. This doesn’t stand a chance at eliminating rows, producing duplicate rows, or producing NULL values. We’re only getting a count of the PK/CX, which isn’t NULLable anyway and…

So don’t do that.

Comments closed

Optimizing Conditionals In DAX

Published 2018-06-19 by Kevin Feasel

Marco Russo shows us a way to optimize mutually exclusive conditional calculations using DAX:

In previous articles, we discussed the importance of variables and how to optimize IF functions to reduce multiple evaluations of the same expression or measure. However, there are scenarios where the calculations executed in different branches of the same expression seem impossible to optimize. For example, consider the following pattern:

1

2

3

4

5

6

Amount :=

IF (

    <condition>,

    [Credit],

    [Debit]

)

In cases like this involving measures A and B, there does not seem to be any possible optimizations. However, by considering the nature of the two measures A and B, they might be different evaluations of the same base measure in different filter contexts.

Read on for a couple of examples.

Comments closed

Non-SARGable Predicates And Computed Columns

Published 2018-06-19 by Kevin Feasel

Erik Darling shows that you can create a computed, indexed column to make a non-SARGable predicate perform a seek operation:

Before I show you what I mean, we should probably define what’s not SARGable in general.

Wrapping columns in functions: ISNULL, COALESCE, LEFT, RIGHT, YEAR, etc.

Evaluating predicates against things indexes don’t track: DATEDIFF(YEAR, a_col, b_col), a_col +b_col, etc.

Optional predicates: a_col = @a_variable or @a_variable IS NULL

Applying some expression to a column: a_col * 1000 < some_value

Applying predicates like this show that you don’t predi-care.

They will result in the “bad” kind of index scans that read the entire index, often poor cardinality estimates, and a bunch of other stuff — sometimes a filter operator if the predicate can’t be pushed down to the index access level of the plan.

Read on for an example.

Comments closed

Event Hub Performance Tips

Published 2018-06-07 by Kevin Feasel

Vincent-Philippe Lauzon has a few tips for improving Azure Event Hub performance:

Here are some recommendations in the light of the performance and throughput results:

If we send many events: always reuse connections, i.e. do not create a connection only for one event. This is valid for both AMQP and HTTP. A simple Connection Pool pattern makes this easy.

If we send many events & throughput is a concern: use AMQP.

If we send few events and latency is a concern: use HTTP / REST.

If events naturally comes in batch of many events: use batch API.

If events do not naturally comes in batch of many events: simply stream events. Do not try to batch them unless network IO is constrained.

If a latency of 0.1 seconds is a concern: move the call to Event Hubs away from your critical performance path.

Let’s now look at the tests we did to come up with those recommendations.

Read the whole thing.

Comments closed

The SQL Server Execution Plan Reference

Published 2018-06-04 by Kevin Feasel

Hugo Kornelis has embarked on a major project:

I didn’t choose the term “Execution Plan Reference” by accident. The core of the EPR will be a full description of all that is known about every operator known to exist in execution plans: what it does, how it functions, what properties it can have, how those properties affect its behavior, and any additional information that may be relevant to understand the operator. This section will be one page for each operator. Of course, some operators are fairly simple while others are more complex, so those pages will vary in length.

Apart from that core content, I planned some generic pages. It makes no sense to repeat the explanation for properties such as Estimated Number of Rows or Number of Executions on each operator’s page, so instead I wanted to have a single page to list and describe all common properties. I also wanted an introduction page that explains the basics of reading an execution plan, lists the properties for plans as a whole, and debunks some common misconceptions.

And there will be articles with additional background. Instead of having to explain what exactly an “anti semi join” is on each of the four pages for the four join operators, I decided to create a single page describing all the logical join types. When working on Hash Match, the page, was already very long and complex before I even had a chance to start on the details of the “dynamic destaging” process that handles memory spills, so I decided to leave that for a future page. As I continue to work on the EPR, I will probably continue to create or plan separate pages for content that applies to multiple operators, or that I consider too deep and too advanced for the operator’s normal page.

This is a huge undertaking, but even in its current state, the Execution Plan Reference looks great and has tremendous potential.

Comments closed

What TDE Does To Query Performance

Published 2018-05-25 by Kevin Feasel

Matthew McGiffen has a few tests on using Transparent Data Encryption:

By the time it had been executed 5 times (with the memory flushed between each execution) each query read about 600,000 pages sized at 8kb each – just under 5GB. If it took 50 seconds on the decryption of those pages, then each page took about 1 twelfth of a milli-second to decrypt – or alternatively, TDE decrypted about 12 pages per millisecond. Or in terms of disk size, 100MB per second. These were tests on a server with magnetic spinning disks (not SSDs) and you can see from the above figures, the straight disk access took about 40 seconds on its own.

When TDE doesn’t read from disk it doesn’t add any overhead, but how do we quantify what the overhead to queries is when it does have to access the disk?

Matthew has some good advice here, and I’d be willing to say that his experience is within the norm for TDE and doesn’t directly contradict general guidelines by enough to shift priors.

Comments closed

Views Don’t Improve Performance

Published 2018-05-15 by Kevin Feasel

Grant Fritchey lays down the law on views:

One day, it’s going to happen. I’m going to hear some crazy theory about how SQL Server works and I’m going to literally explode. Instead of some long silly rant with oddball literary & pop culture references you’ll get a screed the size of Ulysses (and about as much fun to read). However, for the moment, like Robin Williams describing a dance move, I’m going to keep it all inside. Here’s our query:

1

2

3

4

5

6

7

8

9

SELECT soh.SalesOrderNumber,

       sod.OrderQty,

       sod.UnitPrice,

       p.Name

FROM Sales.SalesOrderHeader AS soh

    JOIN Sales.SalesOrderDetail AS sod

        ON sod.SalesOrderID = soh.SalesOrderID

    JOIN Production.Product AS p

        ON p.ProductID = sod.ProductID;

No, no where clause because we have to compare this to this, our view:

1

2

3

4

5

6

7

8

9

10

11

CREATE OR ALTER VIEW dbo.SalesInfo

AS

SELECT soh.SalesOrderNumber,

       sod.OrderQty,

       sod.UnitPrice,

       p.Name

FROM Sales.SalesOrderHeader AS soh

    JOIN Sales.SalesOrderDetail AS sod

        ON sod.SalesOrderID = soh.SalesOrderID

    JOIN Production.Product AS p

        ON p.ProductID = sod.ProductID;

Grant used up much of his strategic reserve of GIFs in that post, so check it out.

Comments closed

Scalar Function Blocking

Published 2018-04-27 by Kevin Feasel

Erik Darling notes that scalar functions can cause multi-table blocking:

Someone had tried to be clever. Looking at the code running, if you’ve been practicing SQL Server for a while, usually means one thing.

A Scalar Valued Function was running!

In this case, here’s what it looked like:

1

2

3

4

5

6

7

8

9

10

11

12

CREATE OR ALTER FUNCTION dbo.BadIdea ( @uid INT )

RETURNS BIGINT

WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING

AS

    BEGIN

        DECLARE @BCount BIGINT;

        SELECT   @BCount = COUNT_BIG(*)

        FROM     dbo.Badges AS b

        WHERE    b.UserId = @uid

        GROUP BY b.UserId;

        RETURN @BCount;

    END;

Someone had added that function as a computed column to the Users table:

1	ALTER TABLE dbo.Users ADD BadgeCount AS dbo.BadIdea(Id);

Spoilers: this was a bad idea.

Comments closed

Row-By-Row Is Slow-By-Slow

Published 2018-04-24 by Kevin Feasel

Lukas Eder points out that row-by-row updates are a great way of slowing down your system:

The best way to find out is to benchmark. I’m doing two benchmarks for this:

One that is run in PL/SQL, showing the performance difference between different approaches that are available to PL/SQL (namely looping, the FORALL syntax, and a single bulk UPDATE)
One that is run in Java, doing JDBC calls, showing the performance difference between different approaches available to Java (namely looping, caching PreparedStatement but still looping, batching, and a single bulk UPDATE)

The results tend to be even more dramatic on SQL Server, where the row-by-row overhead is even greater.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30