Press "Enter" to skip to content

Category: Performance Tuning

Index Rebuilds versus Reorganizations

Sergey Gigoyan opens a can of worms:

When data is modified in a Microsoft SQL database, the corresponding indexes are also modified. These modifications cause indexes to become fragmented. Fragmentation means that the logical order of the data inside the index pages does not match its physical order. The more the fragmentation level is increased, the more performance is affected for SELECT statements. With fragmented indexes, the information is not spread logically, making the index’s data retrieving operations more time-consuming resulting in query performance issues. Thus, fragmentation of the indexes should be fixed periodically in order to maintain high performance. Rebuild and Reorganize index operations are aimed at defragmenting indexes.

Click through to learn a bit about why we may need index maintenance, followed by a comparison between index rebuilding versus index reorganization. All that said, I do not have room in my indexing philosophy (or my heart) for reorganization. Reorganization simply doesn’t provide enough value as it is. If you want a really detailed dive into the topic, Jeff Moden put together an amazing talk that we were lucky enough to be able to record. Specifically, about an hour and a quarter in (link to timestamp), Jeff gets to the topic of reorganization and how little it actually does in practice.

Comments closed

Power BI Data Type Optimization

Nikola Ilic shows how important it can be to choose the right data types:

For demo purposes, I’ll be using a fact table that contains the data about chats performed by a customer support department of the fictitious company Customer First. This table includes approximately 9 million rows, which is not considered a large table in the context of Power BI and analytical workloads. For the sake of simplicity, let’s pretend that our model consists of only this single table. Finally, a semantic model is configured as an Import mode model. If you want to learn how your data is stored in Power BI, I suggest you start by reading this article first.

Data was loaded into Power BI from the underlying data source (SQL Server database) as-is, without any additional optimizations applied.

Nikola walks through the process of finding the most expensive columns in terms of data size and using the least precise acceptable value. One other thing that I commonly see is identity columns or other keys on fact tables. Those are very rarely necessary, because the point of a fact table is typically to aggregate it in some fashion. And these keys are unique (by design), meaning they won’t compress very well and will take up a lot of space. Looking at Nikola’s example, my next question would be, knowing that the name of the table is factChat, does chatID tie to some chat dimension? If not, is it actually necessary for reporting? Again, if not, that could shave off another 60 MB or so from the data model.

Comments closed

SQL Server Compilation Time and Storage

Kendra Little explains how storage can affect query compilation time:

Up till now, I’ve thought of compilation time in SQL Server as being dependent only on CPU resources– not something that requires fast storage to be speedy. But that’s not quite right.

Slow storage can result in periodic long compile time in SQL Server. And long compile time not only extends the runtime for the query, it can also result in blocking with waits for compile locks.

Click through for more details, as well as a video by Erik Darling on compile-time locks.

Comments closed

Spooling in DirectQuery when Moving through On-Premises Gateway

Chris Webb diagnoses and resolves an issue:

Recently I was working with a customer using DirectQuery mode and where all traffic to the data source had to go through an on-premises data gateway for security reasons. They noticed that report performance got worse when traffic went through the gateway and this was particularly true when Power BI generated SQL queries that returned hundreds of thousands of rows. 

Click through to learn more about what Chris found and how to fix the issue.

Comments closed

A Review of Useful pg_stat_statements

Umair Shahid tracks some statements:

pg_stat_statements is an extension for PostgreSQL that tracks execution statistics of SQL statements. It is designed to provide insight into the performance characteristics of database queries by collecting data on various metrics such as execution time, number of calls, and I/O operations. This extension is immensely useful for database administrators and developers looking to optimize their SQL queries and improve overall database performance.

Click through to learn more about pg_stat_statements, including how to install and configure it, as well as some of the things you can do with it.

Comments closed

Testing Postgres Configuration Parameter Performance Changes

Semab Tariq follows up on a prior blog post:

In a previous blog post, we configured an EC2 instance and installed PostgreSQL on it. After the initial setup, we ran pgbench to measure the initial TPS (transactions per second). Then, we tuned PostgreSQL and reran pgbench to see how much we could improve the TPS. To tune PostgreSQL, we adjusted various parameters based on the system’s available resources. In this blog, we will identify which of those parameters contributed the most to the performance improvements.

If you want to read the initial blog post, you can find it here: Key PostgreSQL Configuration Parameters for Enhanced Performance.

We will use the same instance size on AWS EC2 as before, which is t2.large.

I suppose it shouldn’t matter very much if you’re running a performance test over a significant timespan, though it was an interesting choice to use a burstable VM type for the test.

Comments closed

Telegraf Performance Optimization

Riya shares a few tips on making Telegraf stream data more efficiently:

As businesses grow and their infrastructures become more complex, monitoring becomes a critical component of maintaining system health and performance. Telegraf, an open-source server agent for collecting and sending metrics and events from databases, systems, and IoT sensors, is widely used for this purpose. However, handling high volumes of metrics can strain resources and degrade performance. This blog will explore strategies for optimizing Telegraf’s performance when dealing with high-volume metrics.

Click through for an architectural overview and five things you can do to optimize performance.

Comments closed

Troubleshooting Transactional Replication Latency in SQL Server

Bobirmirzo Arslanov has a two-parter around troubleshooting transactional replication latency issues. Part 1 looks at architecture and some of the distribution tables:

Before you dive into solving any issue, you need to fully understand the type of environment you have as there might have been changes you are unaware of. An easy way to do that is to run script  SQLServer/Script Replication Topology at master · sqlserver-parikh/SQLServer (github.com) which gives output like below.

Part 2 digs in deeper:

If wait time is high compared to CPU time, check wait type and troubleshoot accordingly. For example, on the above example we faced MEMORY_ALLOCATION_EXT wait_type but duration is 0. So, we are not waiting.

If CPU time is higher, this means log thread is running but latency is being observed because you have high load.

Check out both posts. I’d really like to see a similar article for merge replication, as it’s a lot harder to troubleshoot.

Comments closed