R Model Compression

Kevin Feasel

2017-07-20

R

I have a post showing off some of the value of compressing R models:

So right now, we’re burning roughly 200K per model.  My stated goal is to be able to store several years worth of data for 10 million products.  Let’s say that I need 10 million products in ProductModel and 1 billion rows in ProductModelHistory.  That means that we’d end up with 1.86 TB of data in the ProductModel table and 186 TB in ProductModelHistory.  This seems…excessive.

As a result, I decided to try using the COMPRESS() function in SQL Server 2016.  The COMPRESS function simply uses GZip compression.  Yeah, there are compression algorithms which tend to be more compact (e.g., bz2 or 7z), but GZip is relatively CPU efficient and I can wrap my SQL statements with COMPRESS() and DECOMPRESS() and not have to change any calling code; I just need to update the two stored procedures I use to insert and then retrieve product models.

Most of the time, it’s not a big deal.  But once you start talking hundreds of gigabytes or in my case, a couple hundred terabytes, it’s definitely worth compressing this data.

Related Posts

The Lesser-Known Apply Functions In R

Andrew Treadway covers a few of the lesser-known apply functions in R: rapply Let’s start with rapply. This function has a couple of different purposes. One is to recursively apply a function to a list. We’ll get to that in a moment. The other use of rapply is to a apply a function to only those elements in […]

Read More

Controlling Azure Services In R With AzureR

Hong Ooi announces a new set of packages called AzureR: As background, some of you may remember the AzureSMR package, which was written a few years back as an R interface to Azure. AzureSMR was very successful and gained a significant number of users, but it was never meant to be maintainable in the long term. As […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31