Press "Enter" to skip to content

Month: August 2017

Using The COMPRESS Function In SQL Server

Kendra Little explains the COMPRESS() function in SQL Server 2016:

One cool little feature in SQL Server 2016 is COMPRESS(). It’s a TSQL function available in all editions that shrinks down data using the GZIP algorithm (documentation).

Things to know about COMPRESS():

  • Compressed data is in the VARBINARY(max) data type

  • You get the data “back to normal” by  using the DECOMPRESS function – which also outputs VARBINARY(max)

  • You can’t use columns of the VARBINARY(max) type in an index key column– but it may be useful to use the column as a filter in a filtered index, in some cases

COMPRESS() uses standard GZip compression, so you could use methods other than DECOMPRESS() to inflate the data—for example, bring the compressed data out to your application and use language-specific GZip libraries to decompress the data.  Read the whole thing.

Comments closed

Stream Analytics Into Power BI

Rolf Tesmer shows off how to use Azure Stream Analytics to push data in real time via the Power BI API into your Power BI dashboard:

You can push data to the Power BI streaming dataset API in a few ways… but they generally boil down to these 3 options

  1. Directly call the API from code
  2. Directly call the API from an Azure Logic App
  3. Use Azure Stream Analytics to push data into the API

This blog post extends on my previous post – and thus I will be leveraging Option #3 above.

Definitely worth checking out if you are interested in real-time Power BI dashboards.

Comments closed

A Game Written In SQL Server

Daniel Janik has built a game that runs entirely in SQL Server:

As it turns out SQL Server Management Studio (SSMS) can display many types of emoji and this is the key.

Keeping in mind that the whole thing was not written with performance or best practice in mind, I’d like to introduce the world to the very first action adventure game 100% written and played in SQL Server!

The goal here is to have a game which helps teach some basics of development practices.  Interesting concept.

Comments closed

Using Startup Stored Procedures

John Morehouse explains how to set up a startup stored procedure, as well as some of the risks involved:

Startup procedures automatically execute whenever SQL Server is started.  Where would you use this?  One example is if you had an application that required the use of a global temporary table, you could use a startup procedure to create it.  This would allow the table to be immediately accessible to any application that requires it.

Another possibility is that you could “warm up” the cache so that data is already cached when an application needs it.  This would help prevent that initial I/O hit the first time the query is called, thus potentially improving initial performance.

Click through for details.  I’ve created a couple of these, but they rarely come to mind and that’s one of the big problems:  if there is an issue, there’s probably quite a bit of investigative work that would take place before somebody remembers that hey, these things exist.

Comments closed

Scaling Out Random Forest

Denis C. Bauer, et al, explain VariantSpark RF, a random forest algorithm designed for huge numbers of variables:

VariantSpark RF starts by randomly assigning subsets of the data to Spark Executors for decision tree building (Fig 1). It then calculates the best split over all nodes and trees simultaneously. This implementation avoids communication bottlenecks between Spark Driver and Executors as information exchange is minimal, allowing it to build large numbers of trees efficiently. This surveys the solution space appropriately to cater for millions of features and thousands of samples.

Furthermore, VariantSpark RF has memory efficient representation of genomics data, optimized communication patterns and computation batching. It also provides efficient implementation of Out-Of-Bag (OOB) error, which substantially simplifies parameter tuning over the computationally more costly alternative of cross-validation.

We implemented VariantSpark RF in scala as it is the most performant interface languages to Apache Spark. Also, new updates to Spark and the interacting APIs will be deployed in scala first, which has been important when working on top of a fast evolving framework.

Give it a read.  Thankfully, I exhibit few of the traits of the degenerative disease known as Hipsterism.

Comments closed

sparklyr 0.6 Released

Javier Luraschi announces sparklyr 0.6:

We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to:

  • Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context.

  • Connect to External Data Sources using spark_read_source()spark_write_source()spark_read_jdbc() and spark_write_jdbc().

  • Use the Latest Frameworks including dplyr 0.7DBI 0.7RStudio 1.1and Spark 2.2.

I’ve been impressed with sparklyr so far.

Comments closed

Reducing Dimensionality

Antoine Guillot explains some of the basic concepts of variable reduction in a data analysis:

Each of these people can be represented as points in a 3 Dimensional space. With a gross approximation, each people is in a 50*50*200 (cm) cube. If we use a resolution of 1cm and three color channels, then can be represented by 1,000,000 variables.
On the other hand, the shadow is only in 2 dimensions and in black and white, so each shadow only needs 50*200=10,000 variables.
The number of variables was divided by 100 ! And if your goal is to detect human vs cat, or even men vs women, the data from the shadow may be enough.

Read on for intuitive discussions of techniques like principal component analysis and linear discriminant analysis.  H/T R-Bloggers

Comments closed

Moving SQL Server Data Files

Jana Sattainathan walks us through the process of moving a SQL Server data file from one drive to another:

Space got tight on a drive and I knew that there was space on another drive. I had already set this particular database with multiple secondary file groups/files (.ndf files) instead of a huge and single .mdf file (which would have made the whole thing a lot harder).

The procedure to relocate data files is extremely simple

Click through for Jana’s seven salvos of administrator success.

Comments closed

Automating tSQLt Tests

James Anderson shows how to integrate tSQLt tests as part of his ReadyRoll pipeline:

By default, ReadyRoll will ignore tSQLt objects, including our tests. We don’t want ReadyRoll to script out the tSQLt objects, but we do want it to script our tests. To set our filter we need to unload the project in VS and edit the project file. Add the following to the section named ReadyRoll Script Generation Section:

James’s series is really coming together at this point, so if you haven’t been reading, check out the links in his post.

Comments closed

T-SQL Tuesday 92 Roundup

Raul Gonzalez wraps up T-SQL Tuesday #92:

On July 2017’s event the proposed topic was aimed for all you to share those little secrets that made your tummy burn after pressing F5.

Since early in the morning I’ve been reading your posts which makes me very happy and feel the topic was certainly well accepted by the community.

In order of published date these are the posts that took part in this month’s event.

Click through to see the 17 entries this month.

Comments closed