Press "Enter" to skip to content

Category: Performance Tuning

Using a File Cache in SSIS

Andy Brownsword makes those SSIS jobs run faster:

Last week we looked at using a cache to improve lookup performance. We saw how a cache improves performance by being able to reuse reference data repeatedly. That used a regular cache but it’s not the only option available to us.

In this post we’re going to look at the File Cache option which can achieve the same results – plus a little more.

My experience with file caches is that they’re a bit finicky but when you get them running, they can provide a significant speed-up to data enrichment tasks.

Leave a Comment

Why Not Use VARCHAR(MAX) for Everything?

David Fowler explains:

When I mentioned to the developer that it’s probably not the best idea, he turned around and asked me, ‘why not?’

It was a genuine question. Why shouldn’t we just spam VARCHAR(MAX) over all of our columns? On the upside, it would get rid of all those annoying issues that crop up when we try to insert a value that overflows the datatype.

Click through for a video as well as a blog post laying out some of the problem with using VARCHAR(MAX) all willy-nilly.

Leave a Comment

Performance Testing the pg_tde Extension

Transparent data encryption is now available in PostgreSQL and Andreas Scherbaum has some performance measures:

The performance impact of pg_tde by Percona for encrypted data pages is measurable across all tests, but not very large. The performance impact of encrypting WAL pages is about 20% for write-heavy tests. The tests were run with an extension RC (Release Candidate), however the WAL encryption feature is still in Beta stage.

Andreas also has a post on the testing specifics:

This test was run on a dedicated physical server, to avoid external influences and fluctuations from virtualization.

The server has a Intel(R) Xeon(R) Gold 5412U CPU with 48 cores, 256 GB RAM, and a 2 TB SAMSUNG MZQL21T9HCJR NVram disk dedicated for the tests (OS was running on a different disk).

Leave a Comment

Moving tempdb to a Temporary Disk in a Cloud

Jeff Horner doesn’t need permanence:

One of the simplest yet most impactful performance optimizations for SQL Server is relocating the tempdb system database to a fast, dedicated disk—especially if that disk is temporary or ephemeral. This is a common practice in cloud environments and high-performance database systems where tempdb churn is high and doesn’t require persistence across reboots.

Click through for more information on how this can work well for you, as well as how to do it safely and things to keep in mind. One thing I would add is, I vaguely recall that on Azure at least, the temporary D: drive might not be mapped at startup, and it may take a few seconds after startup for the mapping to occur. Thus, it may be a good idea to have the SQL Server service start on a delay to ensure that you don’t run into an issue due to task ordering on startup.

Leave a Comment

Changes to Accelerated Database Recovery in 2025

Jordan Boich points out something interesting coming in SQL Server 2025:

Accelerated Database Recovery (ADR) was introduced in SQL Server 2019. Its main purpose is to allow for faster database recovery in the event of a crash or unexpected shutdown. Traditionally, the database engine handles crash recovery through a series of phases—analysis, redo, and undo—which can be inefficient and slow, especially when dealing with long-running transactions.

To make a long story short, ADR “shortcutsˮ the recovery process by introducing a new approach to handling undo operations. Instead of relying heavily on scanning the transaction log—which can be painfully slow for uncommitted or long-running transactions—ADR maintains a version store within the user database to track row-level changes. This allows SQL Server to quickly roll back uncommitted transactions without scanning the entire log. The result is much faster crash recovery, quicker rollbacks, and improved overall database availability, particularly in high-transaction environments.

Read on to see what’s new, as well as some of the consequences of enabling this feature.

Leave a Comment

Result Set Caching in Microsoft Fabric Data Warehouse

Emily Tehrani makes an announcement:

Result Set Caching is now available in preview for Microsoft Fabric Data Warehouse and Lakehouse SQL analytics endpoint. This performance optimization works transparently to cache the results of eligible T-SQL queries. When the same query is issued again, it directly retrieves the stored result, instead of recompiling and recomputing the original query. This operation drastically cuts execution time for complex queries. The cache is then automatically managed on the user’s behalf. This lightweight performance boost is most beneficial for workloads like reports, that issue many repetitive T-SQL queries to the DW and SQL analytics endpoint.

This is something I’ve wished we had on-premises for years and years, especially for data warehouses where you know the data only changes once every x hours or days. You can, of course, do this yourself with the cache-aside pattern and some caching solution, but that implies you have a layer between your end user and the data source that you fully control.

Comments closed

SQL Server Performance Troubleshooting Script Updates

Erik Darling has been busy (plus raking in changes from contributors like he’s farming in an AFK game):

It’s been a busy few months working on scripts in my GitHub repo, including adding two new members to the family.

  • sp_IndexCleanup: Easily deduplicate indexes — scripts all the changes out for you!
  • sp_PerfCheck: A high-level review of performance-related settings and configurations!

Read on to see which other scripts have updates and where you can download all of Erik’s scripts.

Comments closed

Tuning SSIS Data Flow Buffers

Andy Brownsword speeds things up:

When using data flows in SSIS packages we want the flow to be fast and fluid. Improving performance of the flows will vary in different packages, but one element can consistently help – tuning memory utilisation.

In this post we’ll look at tuning the memory utilisation by altering the number of rows passing through the flow concurrently. Specifically we’re looking at the following properties:

  • DefaultBufferSize
  • DefaultBufferMaxRows

This is a pretty big deal, finding the right size that allows you to keep data flowing smoothly without having to wait for buffers to fill. The defaults are aggressively low in most cases. And good on Andy for providing a series of tests to give practical numbers.

Comments closed

The Challenges of Benchmarking

Tomas Vondra digs into some metrics:

I do a fair number of benchmarks, not only to validate patches, but also to find interesting (suspicious) stuff to improve. It’s an important part of my development workflow. And it’s fun 😉 But we’re dealing with complex systems (hardware, OS, DB, application), and that brings challenges. Every now and then I run into something that I don’t quite understand.

Consider a read-only pgbench, the simplest workload there is, with a single SELECT doing lookup by PK. If you do this with a small data set on any machine, the expectation is near linear scaling up to the number of cores. It’s not perfect, CPUs have frequency scaling and power management, but it should be close.

Click through for a quick look at some odd behavior, followed by a lot of interesting digging into the weeds trying to find answers.

Comments closed