Press "Enter" to skip to content

Category: Compression

Run-Length Encoding in Power BI

Chris Webb performs some experiments:

Recently I was involved in an interesting discussion on Twitter X about how partitioning a table in a Power BI dataset semantic model can affect compression and therefore its size and query performance. You can read the thread here. It got me thinking: is there a way to get more detail on how well compression, and in particular run-length encoding (RLE from hereon), is working for a column when you’re using Import mode or Direct Lake? After a bit of research I found out there is, so let’s see some examples that illustrate what I learned.

The results are interesting and Chris explains why we see what we do.

Comments closed

Row Compression and Datatypes

Chad Callihan notes a property of row compression:

Things don’t always go as planned. This quote can apply to many topics, including sizing up your data types. When you started out with a new table, there were no concerns with SMALLINT—until you realized you needed INT. Or maybe you were rolling along fine with INT until you needed to up your game to BIGINT.

If you don’t have much data in your tables and you need to change data types, you may not have much to be concerned about when it comes to the impact of such a change. But problems can arise when you have a table full of data and want to shave off some time to make a data type change as smooth as possible.

The secret here is that, when you’re using row compression, SQL Server maintains the smallest number of bytes for the numeric data. For example, if you have three bytes of 0 and one byte of actual data in your int (i.e., the number is between 0-255), row compression will store one byte. If you subsequently bump this up to a bigint, row compression stores one byte, whereas default non-compressed pages need to be re-written to store seven bytes of 0 and one byte of your number.

This can be a really smart idea if you’re planning a data type change on a large table, as there are extremely few cases in which row compression is bad (“extremely few” as in “I’ve never seen one myself but I have to hold out the possibility that it could happen”) and if you have Enterprise Edition, you can rebuild all of those indexes online and ensure that you aren’t blocking the world for very long.

Comments closed

Combining Backup Encryption and Compression

Matthew McGiffen joins two great flavors:

In SQL Server you can also compress your encrypted backups. Unlike TDE this has been possible with Backup Encryption since the feature was first made available, and there have been no issues that have required fixing – though as always you should still test that restores work correctly. As mentioned in my post about compression with TDE, compressing backups has benefits not just in terms of file size but potentially also in reduced backup times as the time taken to write to disk is smaller.

Read on for more information. Microsoft did the right thing: they compress first and then encrypt; otherwise, you’re not getting any benefit from the compression.

Comments closed

Data Compression and Caching

Paul Randal answers a question about caching of compressed values in SQL Server:

In the previous Curious Case post, I explained how to monitor how well page compression is working. That prompted a reader to ask me how to monitor the hit rate of lookups in the cache of decompressed column values. She was very surprised by my answer…

Read on for Paul’s answer. I’d say that even with the information Paul discloses, I don’t think I’ve ever seen page compression be a bad idea in any system I’ve worked with. There are specific tables in which it’s a bad idea but never have I seen a situation in which I could not use page compression at all due to a performance impact.

Comments closed

Diving into Vertipaq Compression

Ed Pollack explains how Vertipaq compression works to make columnstore indexes so efficient:

Columnstore compression is an impressive array of algorithms that can take large analytic tables and significantly reduce their storage footprint. In doing so, IO is also reduced, and query performance dramatically improved.

This article dives into one aspect of columnstore compression that tends to get buried in all of the hoopla surrounding how awesome columnstore indexes are: Vertipaq optimization. This is a critical component of the columnstore compression process, and understanding how it works can significantly improve the performance of analytic workloads while reducing the computing resources required for the underlying data.

Click through for the steps of the process.

Comments closed

Extending Page Compression Checks

Eitan Blumin builds on a prior Paul Randal post:

So, here is my own expansion to Paul’s script, which adds the following:

* Outputs a remediation command to REBUILD each relevant index with ROW compression.

* The script has parameters for:

– Minimum number of compression attempts

– Maximum success rate percentage

– Index rebuild options

* The script outputs additional details such as:

– Whether the index/table in question is partitioned or not.

– Range-scan percentage

– Leaf-level updates percentage

* It runs the check for ALL accessible and writeable databases.

Click through for the script.

Comments closed

Page Compression Success Rates

Paul Randal has a script for us:

Yesterday I was chatting with Jonathan about some of the internals of the page compression algorithms for a client discussion. When you enable page compression, rows are first just row compressed (storing values with empty bytes stripped out) until a page fills up. The Storage Engine then attempts page compression for the page. 

Click through to see what that entails and how you can see what percentage of pages successfully compress at the page level.

Comments closed

TDE and Backup Compression

Andy Levy learns the truth:

For years, I thought that native backups of databases using Transparent Data Encryption (TDE) couldn’t be compressed. Between TDE being limited to Enterprise Edition until SQL Server 2019 and my own lack of experience with TDE in prior positions, I hadn’t really experimented with this myself. Some people have even gone so far as to skip compression in their backup jobs for TDE-enabled databases because there’s no need to burn those CPU cycles if you won’t get any compression, right?

But a curious thing happened after I upgraded a portion of my environment to SQL Server 2019 in late 2020. I observed that scheduled backups were compressing for some of my TDE-enabled databases, most notably the newer instances. And when I took ad hoc backups in any environment, they were compressed. So why wasn’t it working everywhere?

Read on for the explanation, though one correction: MAXTRANSFERSIZE is 1MB by default only when the database is not encrypted using TDE (and you aren’t backing up to a tape drive). If the database is encrypted using TDE, the default max transfer size is 64KB, and I think that’s what got Andy.

1 Comment

SQL Server Compression Q&A

Bob Pusateri has some follow-up Q&A after a session:

I was extremely fortunate to be able to present about data compression at the EightKB SQL Server internals conference last week. If you missed my talk in person, you can now view it, as well as the entire day, on YouTube!

I was able to answer many questions during the session, but there were a few still left after time ran out. I wanted to address them all, so here they are!

Click through for the video as well as some Q&A.

Comments closed