Query Tuning – Page 42

Quickly Finding Distinct Values with T-SQL

Published 2020-03-17 by Kevin Feasel

Paul White does some amazing things with T-SQL, news at 11:

I will be looking for distinct values in the BountyAmount column of the dbo.Votes table, presented in bounty amount order ascending. The Votes table has just under 53 million rows (52,928,720 to be exact). There are just 19 different bounty amounts, including NULL.
The Stack Overflow 2013 database comes without nonclustered indexes to minimize download time. There is a clustered primary key index on the Id column of the dbo.Votes table. It comes set to SQL Server 2008 compatibility (level 100), but we will start with a more modern setting of SQL Server 2017 (level 140):

Getting the query down from 10.5 seconds to 1ms is crazy.

Comments closed

Tracking Performance of Queries which use RECOMPILE Hints

Published 2020-03-13 by Kevin Feasel

Brent Ozar has some tips if you use RECOMPILE hints frequently:

The first query’s plan stuck around in memory, so it now shows 2 executions, and 2 total rows returned. Its row metrics are correct through the life of the stored procedure’s time in cache.
However, the second query – the one with the recompile hint – has a brand new plan in the cache, but also new metrics. You’re not just recompiling the execution plan, but you’re also not getting query plan metrics here. (That’s fine, and that part I was also kinda aware of.)
But the part that I keep forgetting is that when I’m looking at the stored procedure’s totals in sp_BlitzCache, the total, min, and max values are useless:

If the plan cache isn’t going to help, what will? Brent tells you exactly what.

Comments closed

ASYNC_NETWORK_IO and Execution Plans

Published 2020-03-05 by Kevin Feasel

Jonathan Kehayias dives into an interesting problem:

A few weeks ago, an interesting question was asked on the #SQLHelp hash tag on Twitter about the impact of execution plans on the ASYNC_NETWORK_IO wait type, and it generated some differing opinions and a lot of good discussion.
My immediate answer to this would be that someone is misinterpreting the cause and effect of this, since the ASYNC_NETWORK_IO wait type is encountered when the Engine has results to send over TDS to the client but there are no available TDS buffers on the connection to send them on. Generally speaking, this means that the client side is not consuming the results efficiently, but based on the ensuing discussion I became intrigued enough to do some testing of whether or not an execution plan would actually impact the ASYNC_NETWORK_IO waits significantly.
To summarize: Focusing on ASYNC_NETWORK_IO waits alone as a tuning metric is a mistake. The faster a query executes, the higher this wait type will likely accumulate, even if the client is consuming results as fast as possible. (Also see Greg’s recent post about focusing on waits alone in general.)

Click through for the things Jonathan tested.

Comments closed

Approximate Distinct Count with DAX

Published 2020-03-04 by Kevin Feasel

Gilbert Quevauvilliers runs some performance tests against the approximate distinct count formula in DAX:

I am currently running SQL Server Analysis Services (SSAS) 2019 Enterprise Edition. (This can also be applied to Power BI)
My Fact table has got roughly 950 Million rows stored in
And as mentioned previously it has got over 64 Million distinct users.
The data is queried from SQL Server into SSAS.

Gilbert first checks how close these are and then how much faster the approximate count is.

Comments closed

Multi-Statement TVPs and Time Logged

Published 2020-03-03 by Kevin Feasel

Erik Darling turns the seconds into minutes:

I’ve posted quite a bit about how cached plans can be misleading.
I’m gonna switch that up and talk about how an actual plan can be misleading, too.
In plans that include calling a muti-statement table valued function, no operator logs the time spent in the function. I’ve got a User Voice item for it here.

Click through for the demonstration. If that sounds like something you’d like fixed, vote up the User Voice item.

Comments closed

Sort Keys and Join Types in Amazon Redshift

Published 2020-03-02 by Kevin Feasel

Derik Hammer takes us through query tuning a nasty job on Amazon Redshift:

My team built a process to load from a couple of base tables, in our Amazon Redshift enterprise data warehouse, into an other table which would act as a data mart entity. The data was rolled up and it included some derived fields. The SQL query had some complicity [complexity?, ed.] to it.
This process ran daily and was being killed by our operations team after running for 22 hours.
I stepped in to assist with performance tuning and discovered that join choices, such as INNER vs. OUTER joins have a big impact on whether Redshift can use its sort keys or not.

Click through for more details and what Derik ended up doing.

Comments closed

When the Optimizer Can Use Batch Mode on Row Store

Published 2020-02-27 by Kevin Feasel

Erik Darling looks at some internals for us:

Things like Accelerated Database Recovery, Optimize For Sequential Key, and In-Memory Tempdb Metadata are cool, but they’re server tuning. I love’em, but they’re more helpful for tuning an entire workload than a specific query.
The thing with BMOR is that it’s not just one thing. Getting Batch Mode also allows Adaptive Joins and Memory Grant Feedback to kick in.
But they’re all separate heuristics.

Read on to see the extended events around batch mode to help you determine if it’s possible for the optimizer to use it for a given query.

Comments closed

Fun with Filtering Between Start and End Dates

Published 2020-02-19 by Kevin Feasel

Brent Ozar shows why the StartDate + EndDate pattern is not great for filtering:

If all you need to do is look up the memberships for a specific UserId, and you know the UserId, then it’s a piece of cake. You put a nonclustered index on UserId, and call it a day.
But what if you frequently need to pull all of the memberships that were active on a specific date? That’s where performance tuning gets hard: when you don’t know the UserId, and even worse, you can’t predict the date/time you’re looking up, or if it’s always Right Now.

This is where I advocate pivoting to a series of event records, so instead of a start date and end date, you have an event type (started, expired, cancelled, etc.) and a date. There are other alternatives as well, but it’s a good thought exercise.

Comments closed

Slow Record Cleanup

Published 2020-02-18 by Kevin Feasel

Jared Poche investigates a slow record deletion process:

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.
I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it.

Read on to understand why.

Comments closed

Finding the Query Used in DirectQuery Mode

Published 2020-02-14 by Kevin Feasel

Kasper de Jonge shows us how we can find which query ran in DirectQuery mode to populate a Power BI data set:

When you are optimizing your DirectQuery model and you have done all the optimizations on the model already, you might want to run the queries generated by Power BI by your DBA. He then might be able to do some index tuning or even suggest some model changes. But how do you capture them? There are a few simple ways that I will describe here.

Read on for 3 1/2 such methods.

Comments closed

Category: Query Tuning