R Model Compression

Kevin Feasel



I have a post showing off some of the value of compressing R models:

So right now, we’re burning roughly 200K per model.  My stated goal is to be able to store several years worth of data for 10 million products.  Let’s say that I need 10 million products in ProductModel and 1 billion rows in ProductModelHistory.  That means that we’d end up with 1.86 TB of data in the ProductModel table and 186 TB in ProductModelHistory.  This seems…excessive.

As a result, I decided to try using the COMPRESS() function in SQL Server 2016.  The COMPRESS function simply uses GZip compression.  Yeah, there are compression algorithms which tend to be more compact (e.g., bz2 or 7z), but GZip is relatively CPU efficient and I can wrap my SQL statements with COMPRESS() and DECOMPRESS() and not have to change any calling code; I just need to update the two stored procedures I use to insert and then retrieve product models.

Most of the time, it’s not a big deal.  But once you start talking hundreds of gigabytes or in my case, a couple hundred terabytes, it’s definitely worth compressing this data.

Related Posts

Microsoft R Open 3.5.1

David Smith announces Microsoft R Open 3.5.1: Microsoft R Open 3.5.1 has been released, combining the latest R language engine with multi-processor performance and tools for managing R packages reproducibly. You can download Microsoft R Open 3.5.1 for Windows, Mac and Linux from MRAN now. Microsoft R Open is 100% compatible with all R scripts and packages, and works with […]

Read More

Performing Linear Regression With Power BI

Jason Cantrell shows how to create a simple linear regression in Power BI: Linear Regression is a very useful statistical tool that helps us understand the relationship between variables and the effects they have on each other. It can be used across many industries in a variety of ways – from spurring value to gaining […]

Read More


July 2017
« Jun Aug »