Press "Enter" to skip to content

Category: Compression

Compression Performance

Rolf Tesmer digs into the case of compression of building an index whose leading column has a low cardinality:

That first one is a cracker – it hit me once when compressing a SQL Server table (600M+ rows) on a 64 core Enterprise SQL Server.  After benchmarking several other data compression activities I thought I had a basic “rule of thumb” (based on GB data size and number of rows)… of which just happened to be coincidence!

This also begs the question of why would you use low selectivity indexes?  Well I can think of a few cases – but the one which stands out the most is the identification of a small number of rows within a greater collection – such as an Index on TYPE columns (ie; [ProcessingStatusFlag] CHAR(1) = [P]rocessed, [U]nprocessed, [W]orking, [F]ailed, etc)

… AND SO – lets do some testing to validate this puppy!

There’s a significant difference here, so check out Rolf’s post for the details.

Comments closed

Page Compression

Andy Mallory continues his discussion of compression options:

You can think of page compression as doing data deduplication within a page. If there is some value repeated in multiple spots on a page, then page compression can store the repetitive value only once, and save some space.

Page compression is actually a process that combines three different compression algorithms into a bigger algorithm. Page compression applies these three algorithms in order:
1) Row compression
2) Prefix compression
3) Dictionary compression

Page compression is my go-to compression option, typically.  There are some cases in which it doesn’t work well, so check beforehand (start with sp_estimate_data_compression_savings), but I’ve had good luck with page compression.

1 Comment

Row-Level Compression

Andy Mallon explains row-level compression:

You can think of row compression as working by treating certain fixed-length data types as variable-length data types. By removing certain metadata, NULL and 0 values, and the padding of fixed-length values, SQL Server can reduce the total size of a row.

The easiest way to think of it is that char(n) no longer takes n bytes for every row, but instead gets treated more like varchar(n) where the storage used varies for each value. The behavior for each data type varies, with some data types getting more or less (or no) savings compared to others.

Row-level compression is the “safer” of the two primary compression options, but I almost never use it.  That might just be a function of the my particular workloads, of course.

Comments closed

Compess An Entire Database

Shaun J. Stuart has a script which compresses all (compression-worthy) objects in a database:

Reader Dick H. posted a comment on my last version of this script stating that he got an error when this was run against tables containing sparse columns. Data compression does not support tables with sparse columns, so they should be excluded from this process. I’ve modified this script to correct this. I don’t have any tables with sparse columns in my environment, so thanks to Dick for pointing this out!

For instructions on using this script, look here.

This is a very useful script to have in your back pocket.

Comments closed

Data Compression

Andy Mallon looks at the costs and benefits of data compression:

The obvious benefit is that compressed data takes up less space on disk. Since you probably keep multiple copies of your database (multiple environments, DR, backups, etc), this space savings can really add up. High-performance enterprise-class storage is expensive. Compressing your data to reduce footprint can have a very real benefit to your budget. I once worked on an SAP ERP database that was 12TB uncompressed, and was reduced to just under 4TB after we implemented compression.

My experience with compression is that the benefit vastly outweighs the cost.  Do your own testing, of course.

Comments closed

Data Compression

Corey Beck on data compression:

Before we jump right into enabling either row or page compression, we can actually estimate the savings of each to determine which will provide us with the most savings on storage.  Since page compression includes row compression, we will start with row compression and the estimated savings.

EXEC sp_estimate_data_compression_savings
‘Person’,’Person’,null,null,’row’

In practice, data compression is extremely valuable and in most circumstances, the benefits outweigh the costs.  In certain workloads, you might even see CPU usage go down.

Comments closed

Power Pivot Compression

There might be a theme to today’s posts…

Matt Allington shows us compression in Power Pivot:

Power Pivot would end up storing a table that looks more like the black table above (rather than the blue one), keeping just the minimum amount of information it needs to rebuild the real table of data on the fly when and if required.   If the black RLE table ended up taking more space than the original column of data, then there would be no benefit of RLE and the original column of data would be stored.  Power Pivot may use one or more of the other compression techniques used as well as, or instead of RLE – it all depends on the specifics of the actual data.

This is a very interesting look at ways the Power Pivot team optimize data storage.

Comments closed

Calculating Partition Sizes

Rolf Tesmer has a nice series on partitioning going. His latest entry involves calculating partition sizes in advance:

Sometimes (just sometimes) you need to calculate the size your table partitions upfrontbefore you actually go to the pain and effort of partitioning (or repartition) a table.  Doing this helps with pre-sizing the database files in advance instead of having them auto-grow many many times over in small increments as you cut data over into the partitions.

Check out the entire series.

Comments closed