Press "Enter" to skip to content

Category: Performance Tuning

SQL Server Performance Office Hours

Erik Darling answers a set of user questions:

You have said that table variables, CTEs, Change Tracking, and Azure Managed Instances all suck. Do you have a full list of “features” to avoid?

Click through for a video of Erik answering questions around deadlocks, terrible things, UTF-8, and more. And I like the nuance behind Erik’s answer of this particular question. It’s easy to say “this thing is awful” and be done with it, but often times, the answer is more of “In this particular circumstance, don’t use this thing because of reasons X, Y, and Z; instead, use this thing.” That’s a rather different answer.

Leave a Comment

Table Compaction in Apache Spark

Miles Cole groups things together:

If there anything that data engineers agree about, it’s that table compaction is important. Often one of the first big lessons that folks will learn early on is that not compacting tables can present serious performance issues: you’ve gotten your lakehouse pilot approved and it’s been running for a couple months in production and you find that both reads and writes are increasingly getting slower and slower while your data volumes have not increased drastically. Guess what, you almost surely have a “small file problem”.

What engineers won’t always sing the same tune on is how and when to perform table compaction.

Read on for a dive into the power of compaction (converting a large number of small files into a small number of large files) and plenty of tips along the way.

Comments closed

A List of PostgreSQL Parameters

Semab Tariq has a list:

Have you ever experienced your database slowing down as the amount of data increases? If so, one important factor to consider is tuning PostgreSQL parameters to match your specific workload and requirements. 

PostgreSQL has many parameters because it is designed to be highly flexible and customizable to meet a wide range of use cases and workloads. Each parameter allows you to fine-tune different aspects of the database, such as memory management, query optimization, connection handling, and more. This flexibility helps database administrators to optimize performance based on hardware resources, workload requirements, and specific business needs.

In this blog, I will cover some of the important PostgreSQL parameters, explain their role, and provide recommended values to help you fine-tune your database for better performance and scalability. 

Click through for those parameters, including descriptions, default values, and recommendations.

Comments closed

An Overview of PostgreSQL Performance Monitoring via pgNow

Grant Fritchey announces a product:

I’ve been putting together a new PostgreSQL session called “Performance Monitoring for the Absolute Beginner.” There are several ways to get an understanding of how well your queries are running in PostgreSQL, but, frankly, all of them are a bit of a pain to someone coming from the land of Extended Events (ah, my one true love). Because of this, I saw it as an opportunity to help those just getting going in PostgreSQL. I’ll be presenting it for the first time at Postgres Conference in Orlando on March 19, 2025. Come on by.

Anyhoo, wouldn’t it be nice to maybe have a shortcut, an easier way to look at this information?

Well, there is. Redgate has been working on a completely free tool for leveraging just this sort of data called pgNow. Go here to check it out yourself, but I’ll do a quick run through here.

Click through to see how it works.

Comments closed

Tips for Scaling Apache Kafka

Narendra Lakshmana Gowda tunes a Kafka cluster:

Apache Kafka is known for its ability to process a huge quantity of events in real time. However, to handle millions of events, we need to follow certain best practices while implementing both Kafka producer services and consumer services.

Before start using Kafka in your projects, let’s understand when to use Kafka:

Much of the advice is pretty standard for performance tuning in Kafka, like setting batch size and linger time on the producer or managing consumers in a consumer group.

Comments closed

Improving Power Query CSV File Performance with Data Columns

Chris Webb makes things go faster:

A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.

Read on for the example and explanation.

Comments closed

Preventing Skew in Teradata

Sudheer Kumar Lagisetty shares some performance tuning advice:

Teradata performance optimization and database tuning are crucial for modern enterprise data warehouses. Effective data distribution strategies and data placement mechanisms are key to maintaining fast query responses and system performance, especially when handling petabyte-scale data and real-time analytics. 

Understanding data distribution mechanisms, workload management, and data warehouse management directly affects query optimization, system throughput, and database performance optimization. These database management techniques enable organizations to enhance their data processing capabilities and maintain competitive advantages in enterprise data analytics.

Click through for some tips around data distribution. This idea becomes important in an MPP architecture.

Comments closed

Speed Differences with Separating Data and Log Files

Brent Ozar performs a test:

I’ve already explained that no, it doesn’t make your database server more reliable – and in fact, it’s the exact opposite. But what about performance?

The answer is going to depend on your hardware and workload, but let’s work through an example. I’ll take the first lab workload from the Mastering Server Tuning class and set it up on an AWS i3en.2xlarge VM, which has 8 cores, 64GB RAM, and two 2.5TB NVMe SSDs. (This was one of the cheapest SQL-friendly VM types with two SSDs, but of course there are any number of ways you could run a test like this, including EBS volumes.)

I would expect cloud versus on-premises answers to be quite different, because cloud services tend to throttle you hard on how much storage throughput you’re allowed to have. For that reason, the results make perfect sense in AWS (or Azure or GCP for that matter), but unless your on-prem solution has hard throttles on IOPS or throughput because your sysadmins are monsters, the limits of performance would be in how hard you can push the drives or your storage controllers.

Ultimately, the most appropriate answer is to test your systems and not rely on expectations, especially if you’re shifting from on-premises to a cloud (or vice versa).

Comments closed

Temp Table Performance in PostgreSQL

Jobin Augustine shares a warning:

PostgreSQL is one of the most powerful database systems in the world. I have always been passionate about its great power, especially its modern SQL language features.

However, that doesn’t mean everything is great. There are areas where it hurts. Novice users unaware of the problem might get into trouble, and I see such things happen very often. This is a note for those who design their solutions around PostgreSQL using temporary tables.

This is very interesting to read, especially in comparison to SQL Server. It’s another example of the adage that, just because something works in a specific way on one platform, it’s not necessarily going to work the same way on another.

Comments closed