Press "Enter" to skip to content

Curated SQL Posts

Multi-Migration To Azure

Kevin Hill shows how to migrate multiple databases to Azure SQL Database:

Since I might be doing a similar thing in the near future, I decided to see if I could push 4-5 small test databases to Azure SQLDB at the same time (SQL on Azure VM test may come later).  If this happens with my client it will be hundreds of databases, with the same schema but wildly varying sizes.

The “Plan”

  • Create 5 sample databases

  • Create .bacpacs on local drive

  • Create Azure Logical Server if not already done (see step 4)

  • Create a Notepad of SQLPackage.exe commands,  one for each .bacpac

  • Run with “start” in CMD to fire off separate windows

  • Wait.

  • Enjoy.

Worth reading the whole thing.

Comments closed

Resizing A Linux Partition

Steve Jones shows how to add disk space to a Linux partition:

While working with some SQL Server 2017 tests, I ran out of disk space. I tend to size my VMs around 40GB, and that works for some things, but I’ll run out of space.

I needed to expand the VMWare disk. That doesn’t mean Linux sees the space directly, and I had to figure out how to make the partition bigger. I could have added another disk, but I wanted to work through this process. I learned I needed to have an inactive partition, so I download gparted on a live cd and booted to that.

Steve uses the GUI approach; in the comments, David Klee links to his CLI approach.

Comments closed

Using The COMPRESS Function In SQL Server

Kendra Little explains the COMPRESS() function in SQL Server 2016:

One cool little feature in SQL Server 2016 is COMPRESS(). It’s a TSQL function available in all editions that shrinks down data using the GZIP algorithm (documentation).

Things to know about COMPRESS():

  • Compressed data is in the VARBINARY(max) data type

  • You get the data “back to normal” by  using the DECOMPRESS function – which also outputs VARBINARY(max)

  • You can’t use columns of the VARBINARY(max) type in an index key column– but it may be useful to use the column as a filter in a filtered index, in some cases

COMPRESS() uses standard GZip compression, so you could use methods other than DECOMPRESS() to inflate the data—for example, bring the compressed data out to your application and use language-specific GZip libraries to decompress the data.  Read the whole thing.

Comments closed

Stream Analytics Into Power BI

Rolf Tesmer shows off how to use Azure Stream Analytics to push data in real time via the Power BI API into your Power BI dashboard:

You can push data to the Power BI streaming dataset API in a few ways… but they generally boil down to these 3 options

  1. Directly call the API from code
  2. Directly call the API from an Azure Logic App
  3. Use Azure Stream Analytics to push data into the API

This blog post extends on my previous post – and thus I will be leveraging Option #3 above.

Definitely worth checking out if you are interested in real-time Power BI dashboards.

Comments closed

A Game Written In SQL Server

Daniel Janik has built a game that runs entirely in SQL Server:

As it turns out SQL Server Management Studio (SSMS) can display many types of emoji and this is the key.

Keeping in mind that the whole thing was not written with performance or best practice in mind, I’d like to introduce the world to the very first action adventure game 100% written and played in SQL Server!

The goal here is to have a game which helps teach some basics of development practices.  Interesting concept.

Comments closed

Using Startup Stored Procedures

John Morehouse explains how to set up a startup stored procedure, as well as some of the risks involved:

Startup procedures automatically execute whenever SQL Server is started.  Where would you use this?  One example is if you had an application that required the use of a global temporary table, you could use a startup procedure to create it.  This would allow the table to be immediately accessible to any application that requires it.

Another possibility is that you could “warm up” the cache so that data is already cached when an application needs it.  This would help prevent that initial I/O hit the first time the query is called, thus potentially improving initial performance.

Click through for details.  I’ve created a couple of these, but they rarely come to mind and that’s one of the big problems:  if there is an issue, there’s probably quite a bit of investigative work that would take place before somebody remembers that hey, these things exist.

Comments closed

Scaling Out Random Forest

Denis C. Bauer, et al, explain VariantSpark RF, a random forest algorithm designed for huge numbers of variables:

VariantSpark RF starts by randomly assigning subsets of the data to Spark Executors for decision tree building (Fig 1). It then calculates the best split over all nodes and trees simultaneously. This implementation avoids communication bottlenecks between Spark Driver and Executors as information exchange is minimal, allowing it to build large numbers of trees efficiently. This surveys the solution space appropriately to cater for millions of features and thousands of samples.

Furthermore, VariantSpark RF has memory efficient representation of genomics data, optimized communication patterns and computation batching. It also provides efficient implementation of Out-Of-Bag (OOB) error, which substantially simplifies parameter tuning over the computationally more costly alternative of cross-validation.

We implemented VariantSpark RF in scala as it is the most performant interface languages to Apache Spark. Also, new updates to Spark and the interacting APIs will be deployed in scala first, which has been important when working on top of a fast evolving framework.

Give it a read.  Thankfully, I exhibit few of the traits of the degenerative disease known as Hipsterism.

Comments closed

sparklyr 0.6 Released

Javier Luraschi announces sparklyr 0.6:

We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to:

  • Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context.

  • Connect to External Data Sources using spark_read_source()spark_write_source()spark_read_jdbc() and spark_write_jdbc().

  • Use the Latest Frameworks including dplyr 0.7DBI 0.7RStudio 1.1and Spark 2.2.

I’ve been impressed with sparklyr so far.

Comments closed

Reducing Dimensionality

Antoine Guillot explains some of the basic concepts of variable reduction in a data analysis:

Each of these people can be represented as points in a 3 Dimensional space. With a gross approximation, each people is in a 50*50*200 (cm) cube. If we use a resolution of 1cm and three color channels, then can be represented by 1,000,000 variables.
On the other hand, the shadow is only in 2 dimensions and in black and white, so each shadow only needs 50*200=10,000 variables.
The number of variables was divided by 100 ! And if your goal is to detect human vs cat, or even men vs women, the data from the shadow may be enough.

Read on for intuitive discussions of techniques like principal component analysis and linear discriminant analysis.  H/T R-Bloggers

Comments closed

Moving SQL Server Data Files

Jana Sattainathan walks us through the process of moving a SQL Server data file from one drive to another:

Space got tight on a drive and I knew that there was space on another drive. I had already set this particular database with multiple secondary file groups/files (.ndf files) instead of a huge and single .mdf file (which would have made the whole thing a lot harder).

The procedure to relocate data files is extremely simple

Click through for Jana’s seven salvos of administrator success.

Comments closed