Compression – Curated SQL

SQL Server 2025 introduces a new compression algorithm, ZSTD (Zstandard), which can help with database backup performance. The implementation of ZSTD gives you more control over your backup performance in terms of CPU consumption and backup runtime. I recently ran some rough benchmarks comparing ZSTD, and its three compression levels, with the existing MS_XPRESS algorithm, and the results are compelling and give you some additional tools for managing performance for database backups.

Click through for Anthony’s test and findings.

Comments closed

Compression Tuple Filtering in TimescaleDB

Published 2024-09-19 by Kevin Feasel

Sven Klemm talks compression:

However, it also created a problem. While we had originally intended mutating compressed chunks to be a rare event, people were now pushing its limits with frequent inserts, updates, and deletes. Seeing our customers go all in on this feature confirmed that we were on the right track, but we had to double down on performance.

Today, we’re proud to announce significant improvements as of TimescaleDB 2.16.0, delivering up to 500x faster updates and deletes and 10x faster upserts on compressed data. These optimizations make compressed data behave even more like uncompressed data—without sacrificing performance or flexibility.

Read on to learn a bit more about compression in Postgres and TimescaleDB, as well as how compression tuple filtering works.

Comments closed

Data Compression and CPU Utilization

Published 2024-09-09 by Kevin Feasel

Kendra Little shares some advice:

Every time I share a recommendation to use data compression in SQL Server to reduce physical IO and keep frequently accessed data pages in memory, I hear the same concern from multiple people: won’t this increase CPU usage for inserts, updates, and deletes?

DBAs have been trained to ask this question by many trainings and a lot of online content – I used to mention this as a tradeoff to think about, myself– but I’ve found this is simply the wrong question to ask.

In this post I’ll share the two questions that are valuable to ask for your workload.

Kendra’s advice is very good, and to add my own two cents to the mix: the last place I was at did, in fact, see a pretty reasonable reduction in CPU utilization by performing page-level compression on any index where it made sense—and this was a very busy OLTP environment. The exceptions would be indexes making prominent use of things like Guids or chunks of binary, which don’t compress very well at all. In all my FTE and consulting years, I’ve never run into a circumstance in which compression caused a significant gain in CPU utilization.

Comments closed

Data Compression and Data Type Changes

Published 2024-04-30 by Kevin Feasel

Bob Pusateri asks the important questions:

A few different times I have been asked one or more forms of the following question:

Can datatypes be changed faster with data compression enabled?

I’ve always replied that I’m pretty sure compression will help in this situation, because based on my understanding, it should. But I’ve never had any actual data to back up this belief. Until now. I recently set up a demonstration to test this, and I’m very happy to share the results.

If you want to see the results, you’re going to have to read Bob’s article.

Comments closed

Run-Length Encoding in Power BI

Published 2023-11-06 by Kevin Feasel

Chris Webb performs some experiments:

Recently I was involved in an interesting discussion on ~~Twitter~~ X about how partitioning a table in a Power BI ~~dataset~~ semantic model can affect compression and therefore its size and query performance. You can read the thread here. It got me thinking: is there a way to get more detail on how well compression, and in particular run-length encoding (RLE from hereon), is working for a column when you’re using Import mode or Direct Lake? After a bit of research I found out there is, so let’s see some examples that illustrate what I learned.

The results are interesting and Chris explains why we see what we do.

Comments closed

Row Compression and Datatypes

Published 2023-08-02 by Kevin Feasel

Chad Callihan notes a property of row compression:

Things don’t always go as planned. This quote can apply to many topics, including sizing up your data types. When you started out with a new table, there were no concerns with SMALLINT—until you realized you needed INT. Or maybe you were rolling along fine with INT until you needed to up your game to BIGINT.

If you don’t have much data in your tables and you need to change data types, you may not have much to be concerned about when it comes to the impact of such a change. But problems can arise when you have a table full of data and want to shave off some time to make a data type change as smooth as possible.

The secret here is that, when you’re using row compression, SQL Server maintains the smallest number of bytes for the numeric data. For example, if you have three bytes of 0 and one byte of actual data in your int (i.e., the number is between 0-255), row compression will store one byte. If you subsequently bump this up to a bigint, row compression stores one byte, whereas default non-compressed pages need to be re-written to store seven bytes of 0 and one byte of your number.

This can be a really smart idea if you’re planning a data type change on a large table, as there are extremely few cases in which row compression is bad (“extremely few” as in “I’ve never seen one myself but I have to hold out the possibility that it could happen”) and if you have Enterprise Edition, you can rebuild all of those indexes online and ensure that you aren’t blocking the world for very long.

Comments closed

Combining Backup Encryption and Compression

Published 2023-05-25 by Kevin Feasel

Matthew McGiffen joins two great flavors:

In SQL Server you can also compress your encrypted backups. Unlike TDE this has been possible with Backup Encryption since the feature was first made available, and there have been no issues that have required fixing – though as always you should still test that restores work correctly. As mentioned in my post about compression with TDE, compressing backups has benefits not just in terms of file size but potentially also in reduced backup times as the time taken to write to disk is smaller.

Read on for more information. Microsoft did the right thing: they compress first and then encrypt; otherwise, you’re not getting any benefit from the compression.

Comments closed

Data Compression and Caching

Published 2022-02-08 by Kevin Feasel

Paul Randal answers a question about caching of compressed values in SQL Server:

In the previous Curious Case post, I explained how to monitor how well page compression is working. That prompted a reader to ask me how to monitor the hit rate of lookups in the cache of decompressed column values. She was very surprised by my answer…

Read on for Paul’s answer. I’d say that even with the information Paul discloses, I don’t think I’ve ever seen page compression be a bad idea in any system I’ve worked with. There are specific tables in which it’s a bad idea but never have I seen a situation in which I could not use page compression at all due to a performance impact.

Comments closed

Diving into Vertipaq Compression

Published 2022-01-20 by Kevin Feasel

Ed Pollack explains how Vertipaq compression works to make columnstore indexes so efficient:

Columnstore compression is an impressive array of algorithms that can take large analytic tables and significantly reduce their storage footprint. In doing so, IO is also reduced, and query performance dramatically improved.
This article dives into one aspect of columnstore compression that tends to get buried in all of the hoopla surrounding how awesome columnstore indexes are: Vertipaq optimization. This is a critical component of the columnstore compression process, and understanding how it works can significantly improve the performance of analytic workloads while reducing the computing resources required for the underlying data.

Click through for the steps of the process.

Comments closed

Extending Page Compression Checks

Published 2022-01-18 by Kevin Feasel

Eitan Blumin builds on a prior Paul Randal post:

So, here is my own expansion to Paul’s script, which adds the following:
* Outputs a remediation command to REBUILD each relevant index with ROW compression.
* The script has parameters for:
– Minimum number of compression attempts
– Maximum success rate percentage
– Index rebuild options
* The script outputs additional details such as:
– Whether the index/table in question is partitioned or not.
– Range-scan percentage
– Leaf-level updates percentage
* It runs the check for ALL accessible and writeable databases.

Click through for the script.

Comments closed

Category: Compression

Performance Testing ZSTD Backup Compression

Compression Tuple Filtering in TimescaleDB

Data Compression and CPU Utilization

Data Compression and Data Type Changes

Run-Length Encoding in Power BI

Row Compression and Datatypes

Combining Backup Encryption and Compression

Data Compression and Caching

Diving into Vertipaq Compression

Extending Page Compression Checks