Press "Enter" to skip to content

Category: Compression

Finding Table and Index Compression Levels

Kenneth Fisher has a script for us:

It’s been a while since I worked with compression and the other day I needed to check which of my indexes were compressed and which weren’t. Now, I knew the information wasn’t going to be in sys.tables and I couldn’t find it in sys.indexes or INDEXPROPERTY(). I’ll be honest it had me stumped for a little bit. Until I remembered something!

Read on for Kenneth’s remembered insight and a script to get the job done.

Comments closed

Disabled Indexes Tell No Compression Tales

Eric Cobb gives us a warning around disabling indexes:

Here at work we have a very large, very intensive data load that disables and rebuilds indexes as part of the process. We recently added compression to many of the tables and indexes in the database because it was growing quite large (around 28TB at the time). After adding compression, we got the database size down to somewhere around 17TB.

So you can imagine our surprise when the DB size jumped back up to over 30TB after the last data load! In trying to figure out what happened I discovered that most of the data compression was gone.

That’s…not great. Eric shows us a demo as well and notes that it still applies to SQL Server 2019. I’d be apt to call it a bug, myself.

Comments closed

Columnstore Versus Page Compression

Aaron Bertrand compares columnstore and page compression on a specific table:

Recently someone at work asked for more space to accommodate a rapidly growing table. At the time it had 3.75 billion rows, presented on 143 million pages, and occupying ~1.14TB. Of course we can always throw more disk at a table, but I wanted to see if we could scale this more efficiently than the current linear trend. Sounds like a great job for compression, right? But I also wanted to try out some other solutions, including columnstore – which people are surprisingly reluctant to try. I am no Niko, but I wanted to make an effort to see what it could do for us here.

Note that I am not focusing on reporting workload or other read query performance at this time – I merely want to see what impact I can have on storage (and memory) footprint of this data.

Here is the original table. I’ve changed table and column names to protect the innocent, but everything else is relatively accurate.

Page compression won, and I’ve got a pretty good idea why (though some of the diagnostic info is gone): Aaron has several VARCHAR and NVARCHAR columns, and those blow up the columnstore dictionary pretty fast. Aaron has more to go in this series, so stay tuned.

Comments closed

Problems with sp_estimate_data_compression_savings

Andy Mallon knows it’s getting close to Festivus and he has some grievances to air:

If you’re working with compressed indexes, SQL Server provides a system stored procedure to help test the space savings of implementing data compression: sp_estimate_data_compression_savings. Starting in SQL Server 2019, it can even be used to estimate savings with columnstore.

I really don’t like sp_estimate_data_compression_savings. In fact, I kind of hate it. It’s not always very accurate–and even when it is accurate, the results can be misleading. Before I get ranty about why I don’t like it, let’s look at it in action.

Andy makes good points in this, so check it out.

Comments closed

Compress Those Indexes

Pamela Mooney reminds us that if you’re going to compress your heap or clustered index, remember those non-clustered indexes as well:

We compress to gain space, correct? So why sacrifice it to our indexes?

Here is a script (complete with demo databases and tables) that will find the culprits and fix them for you.  Put it (minus the demo databases and tables) in a job, and quit worrying about this.  Your tables (and indexes) will be sparkling clean. You’re welcome.

Click through for the script. It’s a good reminder that compressing a clustered index does not automatically compress non-clustered indexes—nor should it, as these won’t necessarily compress at the same ratio because data type makeup can differ.

Comments closed

Looking At Compressed Pages

Jess Pomfret shows us what compressed data looks like in SQL Server:

We first need to switch on trace flag 3604: this will write the output of our DBCC PAGE command to the messages tab instead of the event log.

There are 4 parameters for DBCC PAGE: we will need to pass in the database name (or id), the file number, the page id and the print option.  Using a print option of 0 will give us just the page header. In these examples I’m going to use option 3 which gives us more details on the rows stored on the page. For more information on using DBCC PAGE I’d recommend Paul Randal’s post “How to use DBCC PAGE“.

This kind of investigation lets you see how compression really works.

Comments closed

What Happens With Data Compression + Backup Compression

Jess Pomfret tests what happens when you enable backup compression for databases with already-compressed tables in SQL Server:

What happens if I use data compression and backup compression, do I get double compression?

This is a great question, and without diving too deeply into how backup compression works I’m going to do a simple experiment on the WideWorldImporters database.  I’ve restored this database to my local SQL Server 2016 instance and I’m simply going to back it up several times under different conditions.

After restoring the database it’s about 3GB in size, so our testing will be on a reasonably small database.  It would be interesting to see how the results change as the database size increases, perhaps a future blog post.

Click through for the answer.

Comments closed

When Rowstore Compression Beats Columnstore

Joe Obbish looks at scenarios where page-level compression on rowstore tables can beat columnstore compression in terms of resultant table size:

It’s certainly more difficult to come up with a demo that works without string columns, but consider how the page compression algorithm works. Data can be compressed on page basis, which includes both multiple rows and multiple columns. That means that page compression can achieve a higher compression ratio when a row has identical values in different columns. Columnstore is only able to compress on an individual column basis and you won’t directly see better compression with repeated values in different columns for a single row (as far as I know).

Interestingly, Joe also comes up with a scenario where row-level compression can beat columnstore even without string values.  All this said, the normal case when dealing with non-string data is that columnstore tends to compress a lot better.

Comments closed

Backup Compression And Encryption

Arun Sirpal shows the combined effects of backup encryption and backup compression in SQL Server 2017:

Do not forget about the certificate! Warning: The certificate used for encrypting the database encryption key has not been backed up. Imagine if you need to recover the backup and you can’t?  You will get the dreaded thumbprint error.

Msg 33111, Level 16, State 3, Line 25 Cannot find server certificate with thumbprint ‘0x78FAB5A2A5D593FD3C4E163C90B745F70AB51233’. Msg 3013, Level 16, State 1, Line 25

RESTORE DATABASE is terminating abnormally.

So make sure you respect this certificate (and the key) and back it up and re-create them on the target server for a successful restore.

In SQL Server 2016 and 2017, there’s no reason not to encrypt backups; the marginal cost is practically nil even if you’re low enough on disk space that you need to do backup compression.

Comments closed

Finding Compressable Indexes

Tracy Boggiano has a script to help you figure out which indexes make sense to compress:

We can write procedure check periodically rather a table will benefit from compression or not.  There are a few tricks though:

  1. If you have sparse columns you cannot compress the table, we check for that in lines 70-74.
  2. Next if is a index type of 1 (a clustered index) it cannot contain LOB data types image, ntext, or text.  Which these days shouldn’t be a problem, they have been after all deprecated, so you aren’t using them.  We check for these in lines 76-84.
  3. Finally, we perform the same test for LOB data on index type of 2 (nonclustered indexes) in lines 85-95.

If everything checks out OK we call our second procedure to calculate the percentage of space we would space and if we would save more than 25% then we create script to compression the index.  Now we have a @Debug parameter in here so it can just show us what code it would execute to determine what it will do.

Click through for the code for the two procedures Tracy has created.

Comments closed