Press "Enter" to skip to content

Category: Query Tuning

Approximate Distinct Count with DAX

Gilbert Quevauvilliers runs some performance tests against the approximate distinct count formula in DAX:

I am currently running SQL Server Analysis Services (SSAS) 2019 Enterprise Edition. (This can also be applied to Power BI)

My Fact table has got roughly 950 Million rows stored in

And as mentioned previously it has got over 64 Million distinct users.

The data is queried from SQL Server into SSAS.

Gilbert first checks how close these are and then how much faster the approximate count is.

Comments closed

Multi-Statement TVPs and Time Logged

Erik Darling turns the seconds into minutes:

I’ve posted quite a bit about how cached plans can be misleading.

I’m gonna switch that up and talk about how an actual plan can be misleading, too.

In plans that include calling a muti-statement table valued function, no operator logs the time spent in the function. I’ve got a User Voice item for it here.

Click through for the demonstration. If that sounds like something you’d like fixed, vote up the User Voice item.

Comments closed

Sort Keys and Join Types in Amazon Redshift

Derik Hammer takes us through query tuning a nasty job on Amazon Redshift:

My team built a process to load from a couple of base tables, in our Amazon Redshift enterprise data warehouse, into an other table which would act as a data mart entity. The data was rolled up and it included some derived fields. The SQL query had some complicity [complexity?, ed.] to it.

This process ran daily and was being killed by our operations team after running for 22 hours.

I stepped in to assist with performance tuning and discovered that join choices, such as INNER vs. OUTER joins have a big impact on whether Redshift can use its sort keys or not.

Click through for more details and what Derik ended up doing.

Comments closed

When the Optimizer Can Use Batch Mode on Row Store

Erik Darling looks at some internals for us:

Things like Accelerated Database RecoveryOptimize For Sequential Key, and In-Memory Tempdb Metadata are cool, but they’re server tuning. I love’em, but they’re more helpful for tuning an entire workload than a specific query.

The thing with BMOR is that it’s not just one thing. Getting Batch Mode also allows Adaptive Joins and Memory Grant Feedback to kick in.

But they’re all separate heuristics.

Read on to see the extended events around batch mode to help you determine if it’s possible for the optimizer to use it for a given query.

Comments closed

Fun with Filtering Between Start and End Dates

Brent Ozar shows why the StartDate + EndDate pattern is not great for filtering:

If all you need to do is look up the memberships for a specific UserId, and you know the UserId, then it’s a piece of cake. You put a nonclustered index on UserId, and call it a day.

But what if you frequently need to pull all of the memberships that were active on a specific date? That’s where performance tuning gets hard: when you don’t know the UserId, and even worse, you can’t predict the date/time you’re looking up, or if it’s always Right Now.

This is where I advocate pivoting to a series of event records, so instead of a start date and end date, you have an event type (started, expired, cancelled, etc.) and a date. There are other alternatives as well, but it’s a good thought exercise.

Comments closed

Slow Record Cleanup

Jared Poche investigates a slow record deletion process:

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.

I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it.

Read on to understand why.

Comments closed

Finding the Query Used in DirectQuery Mode

Kasper de Jonge shows us how we can find which query ran in DirectQuery mode to populate a Power BI data set:

When you are optimizing your DirectQuery model and you have done all the optimizations on the model already, you might want to run the queries generated by Power BI by your DBA. He then might be able to do some index tuning or even suggest some model changes. But how do you capture them? There are a few simple ways that I will describe here.

Read on for 3 1/2 such methods.

Comments closed

Interleaved Execution with SQL Server

Milos Radivojevic takes us through improvements with interleaved execution in SQL Server:

As you might know, the Interleaved Execution is the member of the Intelligent Query Processing family of features. It has been introduced with SQL Server 2017 (as a part of the Adaptive Query Processing). It is designed to improve the performance of queries referencing multi-statement table-valued functions (MSTVF). Actually, it addresses currently only queries using MSTVF, but is hopefully designed for much more. The query optimizer usually has two issues with queries using MSTVF:

MSTVF is a black-box for the optimizer; it does not know what’s inside, it cannot perform cross-statement optimization (as it is a case with inline TVFs) and it assumes it is a cheap and fast operation
MSTVF has a fixed cardinality of 100 (prior to SQL Server 2014, it was 1)

Interleaved execution does not improve the first issue (MSTVF is still a black-box for the optimizer), but solves the cardinality issue.

Read on to understand how this second aspect has changed for the better.

Comments closed

SQL Server and Query Costs

Jared Poche explains some of the ideas behind the costing algorithm in SQL Server:

One thing to remember is that cost in SQL Server is always an estimate. This is a number SQL Server calculates when considering multiple potential plans to determine which would be the best. But the number of rows it expects a given operation to return or how many times that operation runs can be off. All of that is based on statistics.

It doesn’t then go back and update the cost number later if those numbers were incorrect. So while we can use the cost as an indicator of which query or operator we should focus on, don’t completely tunnel-vision that one thing.

This kind of cost mismatch allows something to look awful on an execution plan but not actually be a problem, or (in the case of most user-defined functions prior to SQL Server 2019) vice versa.

Comments closed

The Cost of Sorting in Stored Procedures

Monica Rathbun wants us to think about whether we really need that ORDER BY clause:

We know that sorting can be one of the most expensive things in an execution plan as shown below. However,  we continue to do ORDER BYs repeatedly. Yes, I 100% agree that there is a need to sort a results set and that this should be done in the procedure for good reason, but my concern is having  multiple sorts, erroneous sorts, and the sorts that can be done elsewhere. These are the ones that waste resources and can stifle performance.

Click through for a demo showing that this does make a difference.

Comments closed