Category: Compression

We compress to gain space, correct? So why sacrifice it to our indexes?
Here is a script (complete with demo databases and tables) that will find the culprits and fix them for you. Put it (minus the demo databases and tables) in a job, and quit worrying about this. Your tables (and indexes) will be sparkling clean. You’re welcome.

Click through for the script. It’s a good reminder that compressing a clustered index does not automatically compress non-clustered indexes—nor should it, as these won’t necessarily compress at the same ratio because data type makeup can differ.

Comments closed

Looking At Compressed Pages

Published 2019-02-20 by Kevin Feasel

Jess Pomfret shows us what compressed data looks like in SQL Server:

We first need to switch on trace flag 3604: this will write the output of our DBCC PAGE command to the messages tab instead of the event log.
There are 4 parameters for DBCC PAGE: we will need to pass in the database name (or id), the file number, the page id and the print option. Using a print option of 0 will give us just the page header. In these examples I’m going to use option 3 which gives us more details on the rows stored on the page. For more information on using DBCC PAGE I’d recommend Paul Randal’s post “How to use DBCC PAGE“.

This kind of investigation lets you see how compression really works.

Comments closed

What Happens With Data Compression + Backup Compression

Published 2018-08-20 by Kevin Feasel

Jess Pomfret tests what happens when you enable backup compression for databases with already-compressed tables in SQL Server:

What happens if I use data compression and backup compression, do I get double compression?

This is a great question, and without diving too deeply into how backup compression works I’m going to do a simple experiment on the WideWorldImporters database. I’ve restored this database to my local SQL Server 2016 instance and I’m simply going to back it up several times under different conditions.

After restoring the database it’s about 3GB in size, so our testing will be on a reasonably small database. It would be interesting to see how the results change as the database size increases, perhaps a future blog post.

Click through for the answer.

Comments closed

When Rowstore Compression Beats Columnstore

Published 2018-06-29 by Kevin Feasel

Joe Obbish looks at scenarios where page-level compression on rowstore tables can beat columnstore compression in terms of resultant table size:

It’s certainly more difficult to come up with a demo that works without string columns, but consider how the page compression algorithm works. Data can be compressed on page basis, which includes both multiple rows and multiple columns. That means that page compression can achieve a higher compression ratio when a row has identical values in different columns. Columnstore is only able to compress on an individual column basis and you won’t directly see better compression with repeated values in different columns for a single row (as far as I know).

Interestingly, Joe also comes up with a scenario where row-level compression can beat columnstore even without string values. All this said, the normal case when dealing with non-string data is that columnstore tends to compress a lot better.

Comments closed

Backup Compression And Encryption

Published 2017-10-10 by Kevin Feasel

Arun Sirpal shows the combined effects of backup encryption and backup compression in SQL Server 2017:

Do not forget about the certificate! Warning: The certificate used for encrypting the database encryption key has not been backed up. Imagine if you need to recover the backup and you can’t? You will get the dreaded thumbprint error.

Msg 33111, Level 16, State 3, Line 25 Cannot find server certificate with thumbprint ‘0x78FAB5A2A5D593FD3C4E163C90B745F70AB51233’. Msg 3013, Level 16, State 1, Line 25

RESTORE DATABASE is terminating abnormally.

So make sure you respect this certificate (and the key) and back it up and re-create them on the target server for a successful restore.

In SQL Server 2016 and 2017, there’s no reason not to encrypt backups; the marginal cost is practically nil even if you’re low enough on disk space that you need to do backup compression.

Comments closed

Finding Compressable Indexes

Published 2017-08-30 by Kevin Feasel

Tracy Boggiano has a script to help you figure out which indexes make sense to compress:

We can write procedure check periodically rather a table will benefit from compression or not. There are a few tricks though:

If you have sparse columns you cannot compress the table, we check for that in lines 70-74.

Next if is a index type of 1 (a clustered index) it cannot contain LOB data types image, ntext, or text. Which these days shouldn’t be a problem, they have been after all deprecated, so you aren’t using them. We check for these in lines 76-84.

Finally, we perform the same test for LOB data on index type of 2 (nonclustered indexes) in lines 85-95.

If everything checks out OK we call our second procedure to calculate the percentage of space we would space and if we would save more than 25% then we create script to compression the index. Now we have a @Debug parameter in here so it can just show us what code it would execute to determine what it will do.

Click through for the code for the two procedures Tracy has created.

Comments closed

The Cost Of Compression

Published 2017-01-24 by Kevin Feasel

Erin Stellato looks at the COMPRESS function:

While this data is interesting, I’m more curious about how compression affects my everyday SELECT queries. I have a set of three stored procedures that each have one SELECT query, so that each index is used. I created these procedures for each table, and then wrote a script to pull values for first and last names to use for testing. Here is the script to create the procedures.

Once we have the stored procedures created, we can run the script below to call them. Kick this off and then wait a couple minutes…

To me, the COMPRESS function is most useful for compressing information you tend not to search through but need to keep the in the database, like HTML markup or long descriptions.

Comments closed

Data Compression

Published 2016-08-22 by Kevin Feasel

Melissa Connors discusses compression options and gives examples of data which will compress and that which will not:

Page Compression is what I like to refer to as “compression for real this time” as it goes well beyond the smart storage method of row and uses patterns/repeating values to condense the stored data.

First, to gain a better understanding of this method, check out a simple representation of a page of data. This is illustrated below in Figure 1. You’ll notice that there are some repeating values (e.g. SQLR) and some repeated strings of characters (e.g. SSSLL).

I really appreciate getting an idea of what kind of data does not compress well. You’d think auto-incrementing numbers would be another scenario, but Melissa explains how that’s not necessarily the case.

Comments closed

Compressing LOB Data

Published 2016-06-16 by Kevin Feasel

Andy Mallon wants to compress LOB data:

The COMPRESS() function takes an input of string or binary data, and applies the gzip algorithm to it. It returns a value of type varbinary(max). In essence, instead of storing string or binary data, you can gzip it up and store it in a varbinary(max) column. There’s also a DECOMPRESS() function for when you are reading the data and need to unzip it.

This costs some CPU, but gzip can save quite a bit of space. How much space, and whether it’s worth the CPU cost will vary depending on your data and workload. In this blog post, we’ll take a look at one table. We’ll look at the space savings we get out of using COMPRESS(), and we’ll look at the effort necessary to implement it.

Read on for Andy’s test and thoughts.

Comments closed

Compression Performance

Published 2016-03-16 by Kevin Feasel

Rolf Tesmer digs into the case of compression of building an index whose leading column has a low cardinality:

That first one is a cracker – it hit me once when compressing a SQL Server table (600M+ rows) on a 64 core Enterprise SQL Server. After benchmarking several other data compression activities I thought I had a basic “rule of thumb” (based on GB data size and number of rows)… of which just happened to be coincidence!

This also begs the question of why would you use low selectivity indexes? Well I can think of a few cases – but the one which stands out the most is the identification of a small number of rows within a greater collection – such as an Index on TYPE columns (ie; [ProcessingStatusFlag] CHAR(1) = [P]rocessed, [U]nprocessed, [W]orking, [F]ailed, etc)

… AND SO – lets do some testing to validate this puppy!

There’s a significant difference here, so check out Rolf’s post for the details.

Comments closed