There are 17 new stored procedures in SSIS Catalog 2016 CTP 3.0. Most of them is to cover changes related to AlwaysOn and Customized Logging Levels, however there are some others. Let’s see.
There are also procedures to support incremental package deployment, so check out the post.
There might be a theme to today’s posts…
Power Pivot would end up storing a table that looks more like the black table above (rather than the blue one), keeping just the minimum amount of information it needs to rebuild the real table of data on the fly when and if required. If the black RLE table ended up taking more space than the original column of data, then there would be no benefit of RLE and the original column of data would be stored. Power Pivot may use one or more of the other compression techniques used as well as, or instead of RLE – it all depends on the specifics of the actual data.
This is a very interesting look at ways the Power Pivot team optimize data storage.
For example, given a gzip file that contains a single csv file, here’s an example M query showing how the Binary.Decompress() function can be used to extract the csv file from the gzip file and then treat the contents of the csv file as a table:
He goes on ot show how Binary.Decompress is used to read Excel XLSX files.
Warning, this method involves the use of .NET – proceed with caution! That said, please proceed because the results are certainly worth it.
The same 32MB file that took 5 minutes using the first method, 2 minutes using the second method, now only took ~3 seconds using this method. Also, the processing time is essentially linear – so even when we bump it up to 4 files (or ~128MB of data) its only going to take ~12 seconds. This would have brought us down to under 5 minutes to process all 3GB of data.
Again, I’m not a .NET expert (not really even a novice) so the code I’m sharing is most likely terrible…but it works (at least for me) and its fast. So here it is…read through it and feel free to heckle me in the comments 😉
Don’t heckle Bill; thank him.
These functions use standard GZip algorithm so you can compress data in application layer/client side and send compressed data to SQL Server, or you can compress data in SELECT query and DECOMPRESS it in client side.
This seems potentially useful if you’re going to store blobs of data you don’t need to parse.