Press "Enter" to skip to content

Category: Performance Tuning

Finding a Good Cost Threshold for Parallelism

Jared Westover goes on a quest:

Given modern hardware, you might hear that the default setting of 5 for the Cost Threshold for Parallelism (CTFP) is far too low. However, people are left with a decision: Should they change it or leave it alone? If I change it and the performance gets worse, I’ll be left with egg on my face. What exactly is the benefit of increasing it, especially for smaller-cost queries?

Read on to learn more about what Cost Threshold for Parallelism is, how you can set it, and a simple example of how the setting can affect you. Jared also has some links to great resources that I highly recommend you check out.

Comments closed

Memoizing Functions with Snowflake

Kevin Wilkie is speaking my language (that is, the language of functional programming):

If you’ve been working with data for several years like I have – mostly using the SQL language – then I have a term for you that other languages, like JavaScript or Python, have had for a few years. The term is “memoizable” and it means, in a nutshell, to remember. A memoizable function caches the results so that it can return the resultset in record time, given the same parameters.

Yeah, it’s a fancy term that basically states, “Instead of calculating the result each time, I’ll just create a lookup table of all possible inputs and what the output is.” It’s really helpful when you have a small number of possible inputs and generating a result takes a while.

Read on to learn more about how this works in Snowflake, including several limitations.

Comments closed

Cardinality Estimation Changes Post-SQL Server 2014

Brent Ozar reminds us that small changes happen:

About 10 years ago, Microsoft made changes to the Cardinality Estimator (CE) which caused some problems for SQL Server upgrades. When folks upgraded to SQL Server 2014, they also casually switched their databases’ compatibility level to the latest version, because for years that hadn’t really affected query plans. They just figured they wanted the “latest and greatest” compat level, without regard to the effects. That backfired badly when they suddenly got 2014’s Cardinality Estimation changes.

So for several years, whenever someone upgraded from older versions, and they complained about performance, the stock community answer was, “Change your compatibility level back to what it used to be.” In many cases, that just solved the problems outright, leading to blog posts like this and this.

Even today on SQL Server 2019 & 2022, this advice is still relevant! If you mess around with compatibility levels, you can absolutely change cardinality estimations in ways you didn’t expect.

Read on for more information, including examples where cardinality estimation has improved with SQL Server 2022 and where it has gotten worse.

Comments closed

Index Rebuilds versus Reorganizations

Sergey Gigoyan opens a can of worms:

When data is modified in a Microsoft SQL database, the corresponding indexes are also modified. These modifications cause indexes to become fragmented. Fragmentation means that the logical order of the data inside the index pages does not match its physical order. The more the fragmentation level is increased, the more performance is affected for SELECT statements. With fragmented indexes, the information is not spread logically, making the index’s data retrieving operations more time-consuming resulting in query performance issues. Thus, fragmentation of the indexes should be fixed periodically in order to maintain high performance. Rebuild and Reorganize index operations are aimed at defragmenting indexes.

Click through to learn a bit about why we may need index maintenance, followed by a comparison between index rebuilding versus index reorganization. All that said, I do not have room in my indexing philosophy (or my heart) for reorganization. Reorganization simply doesn’t provide enough value as it is. If you want a really detailed dive into the topic, Jeff Moden put together an amazing talk that we were lucky enough to be able to record. Specifically, about an hour and a quarter in (link to timestamp), Jeff gets to the topic of reorganization and how little it actually does in practice.

Comments closed

Power BI Data Type Optimization

Nikola Ilic shows how important it can be to choose the right data types:

For demo purposes, I’ll be using a fact table that contains the data about chats performed by a customer support department of the fictitious company Customer First. This table includes approximately 9 million rows, which is not considered a large table in the context of Power BI and analytical workloads. For the sake of simplicity, let’s pretend that our model consists of only this single table. Finally, a semantic model is configured as an Import mode model. If you want to learn how your data is stored in Power BI, I suggest you start by reading this article first.

Data was loaded into Power BI from the underlying data source (SQL Server database) as-is, without any additional optimizations applied.

Nikola walks through the process of finding the most expensive columns in terms of data size and using the least precise acceptable value. One other thing that I commonly see is identity columns or other keys on fact tables. Those are very rarely necessary, because the point of a fact table is typically to aggregate it in some fashion. And these keys are unique (by design), meaning they won’t compress very well and will take up a lot of space. Looking at Nikola’s example, my next question would be, knowing that the name of the table is factChat, does chatID tie to some chat dimension? If not, is it actually necessary for reporting? Again, if not, that could shave off another 60 MB or so from the data model.

Comments closed

SQL Server Compilation Time and Storage

Kendra Little explains how storage can affect query compilation time:

Up till now, I’ve thought of compilation time in SQL Server as being dependent only on CPU resources– not something that requires fast storage to be speedy. But that’s not quite right.

Slow storage can result in periodic long compile time in SQL Server. And long compile time not only extends the runtime for the query, it can also result in blocking with waits for compile locks.

Click through for more details, as well as a video by Erik Darling on compile-time locks.

Comments closed

Spooling in DirectQuery when Moving through On-Premises Gateway

Chris Webb diagnoses and resolves an issue:

Recently I was working with a customer using DirectQuery mode and where all traffic to the data source had to go through an on-premises data gateway for security reasons. They noticed that report performance got worse when traffic went through the gateway and this was particularly true when Power BI generated SQL queries that returned hundreds of thousands of rows. 

Click through to learn more about what Chris found and how to fix the issue.

Comments closed

A Review of Useful pg_stat_statements

Umair Shahid tracks some statements:

pg_stat_statements is an extension for PostgreSQL that tracks execution statistics of SQL statements. It is designed to provide insight into the performance characteristics of database queries by collecting data on various metrics such as execution time, number of calls, and I/O operations. This extension is immensely useful for database administrators and developers looking to optimize their SQL queries and improve overall database performance.

Click through to learn more about pg_stat_statements, including how to install and configure it, as well as some of the things you can do with it.

Comments closed