Press "Enter" to skip to content

Category: Compression

Compression Tuple Filtering in TimescaleDB

Sven Klemm talks compression:

However, it also created a problem. While we had originally intended mutating compressed chunks to be a rare event, people were now pushing its limits with frequent inserts, updates, and deletes. Seeing our customers go all in on this feature confirmed that we were on the right track, but we had to double down on performance.

Today, we’re proud to announce significant improvements as of TimescaleDB 2.16.0, delivering up to 500x faster updates and deletes and 10x faster upserts on compressed data. These optimizations make compressed data behave even more like uncompressed data—without sacrificing performance or flexibility.

Read on to learn a bit more about compression in Postgres and TimescaleDB, as well as how compression tuple filtering works.

Comments closed

Data Compression and CPU Utilization

Kendra Little shares some advice:

Every time I share a recommendation to use data compression in SQL Server to reduce physical IO and keep frequently accessed data pages in memory, I hear the same concern from multiple people: won’t this increase CPU usage for inserts, updates, and deletes?

DBAs have been trained to ask this question by many trainings and a lot of online content – I used to mention this as a tradeoff to think about, myself– but I’ve found this is simply the wrong question to ask.

In this post I’ll share the two questions that are valuable to ask for your workload.

Kendra’s advice is very good, and to add my own two cents to the mix: the last place I was at did, in fact, see a pretty reasonable reduction in CPU utilization by performing page-level compression on any index where it made sense—and this was a very busy OLTP environment. The exceptions would be indexes making prominent use of things like Guids or chunks of binary, which don’t compress very well at all. In all my FTE and consulting years, I’ve never run into a circumstance in which compression caused a significant gain in CPU utilization.

Comments closed

Data Compression and Data Type Changes

Bob Pusateri asks the important questions:

A few different times I have been asked one or more forms of the following question:

Can datatypes be changed faster with data compression enabled?

I’ve always replied that I’m pretty sure compression will help in this situation, because based on my understanding, it should. But I’ve never had any actual data to back up this belief. Until now. I recently set up a demonstration to test this, and I’m very happy to share the results.

If you want to see the results, you’re going to have to read Bob’s article.

Comments closed

Run-Length Encoding in Power BI

Chris Webb performs some experiments:

Recently I was involved in an interesting discussion on Twitter X about how partitioning a table in a Power BI dataset semantic model can affect compression and therefore its size and query performance. You can read the thread here. It got me thinking: is there a way to get more detail on how well compression, and in particular run-length encoding (RLE from hereon), is working for a column when you’re using Import mode or Direct Lake? After a bit of research I found out there is, so let’s see some examples that illustrate what I learned.

The results are interesting and Chris explains why we see what we do.

Comments closed

Row Compression and Datatypes

Chad Callihan notes a property of row compression:

Things don’t always go as planned. This quote can apply to many topics, including sizing up your data types. When you started out with a new table, there were no concerns with SMALLINT—until you realized you needed INT. Or maybe you were rolling along fine with INT until you needed to up your game to BIGINT.

If you don’t have much data in your tables and you need to change data types, you may not have much to be concerned about when it comes to the impact of such a change. But problems can arise when you have a table full of data and want to shave off some time to make a data type change as smooth as possible.

The secret here is that, when you’re using row compression, SQL Server maintains the smallest number of bytes for the numeric data. For example, if you have three bytes of 0 and one byte of actual data in your int (i.e., the number is between 0-255), row compression will store one byte. If you subsequently bump this up to a bigint, row compression stores one byte, whereas default non-compressed pages need to be re-written to store seven bytes of 0 and one byte of your number.

This can be a really smart idea if you’re planning a data type change on a large table, as there are extremely few cases in which row compression is bad (“extremely few” as in “I’ve never seen one myself but I have to hold out the possibility that it could happen”) and if you have Enterprise Edition, you can rebuild all of those indexes online and ensure that you aren’t blocking the world for very long.

Comments closed

Combining Backup Encryption and Compression

Matthew McGiffen joins two great flavors:

In SQL Server you can also compress your encrypted backups. Unlike TDE this has been possible with Backup Encryption since the feature was first made available, and there have been no issues that have required fixing – though as always you should still test that restores work correctly. As mentioned in my post about compression with TDE, compressing backups has benefits not just in terms of file size but potentially also in reduced backup times as the time taken to write to disk is smaller.

Read on for more information. Microsoft did the right thing: they compress first and then encrypt; otherwise, you’re not getting any benefit from the compression.

Comments closed

Data Compression and Caching

Paul Randal answers a question about caching of compressed values in SQL Server:

In the previous Curious Case post, I explained how to monitor how well page compression is working. That prompted a reader to ask me how to monitor the hit rate of lookups in the cache of decompressed column values. She was very surprised by my answer…

Read on for Paul’s answer. I’d say that even with the information Paul discloses, I don’t think I’ve ever seen page compression be a bad idea in any system I’ve worked with. There are specific tables in which it’s a bad idea but never have I seen a situation in which I could not use page compression at all due to a performance impact.

Comments closed

Diving into Vertipaq Compression

Ed Pollack explains how Vertipaq compression works to make columnstore indexes so efficient:

Columnstore compression is an impressive array of algorithms that can take large analytic tables and significantly reduce their storage footprint. In doing so, IO is also reduced, and query performance dramatically improved.

This article dives into one aspect of columnstore compression that tends to get buried in all of the hoopla surrounding how awesome columnstore indexes are: Vertipaq optimization. This is a critical component of the columnstore compression process, and understanding how it works can significantly improve the performance of analytic workloads while reducing the computing resources required for the underlying data.

Click through for the steps of the process.

Comments closed

Extending Page Compression Checks

Eitan Blumin builds on a prior Paul Randal post:

So, here is my own expansion to Paul’s script, which adds the following:

* Outputs a remediation command to REBUILD each relevant index with ROW compression.

* The script has parameters for:

– Minimum number of compression attempts

– Maximum success rate percentage

– Index rebuild options

* The script outputs additional details such as:

– Whether the index/table in question is partitioned or not.

– Range-scan percentage

– Leaf-level updates percentage

* It runs the check for ALL accessible and writeable databases.

Click through for the script.

Comments closed

Page Compression Success Rates

Paul Randal has a script for us:

Yesterday I was chatting with Jonathan about some of the internals of the page compression algorithms for a client discussion. When you enable page compression, rows are first just row compressed (storing values with empty bytes stripped out) until a page fills up. The Storage Engine then attempts page compression for the page. 

Click through to see what that entails and how you can see what percentage of pages successfully compress at the page level.

Comments closed