Press "Enter" to skip to content

Author: Kevin Feasel

Use Source Control

James Anderson wants you to use source control:

SSC and SSDT require the use of compare tools to build deployment scripts. This is referred to as a state based migration. I’d done deployments like this in the past and saw that people reviewing the release found it difficult to review these scripts when the changes were more than trivial. For this reason, I decided to look at some migration based solutions. Migration solutions generate scripts during the development process that will be used to deploy changes to production. This allows the developer to break the changes down into small manageable individual scripts which in turn makes code reviews easier and deployments feel controlled. These scripts sit in the VS project and are therefore source controlled in the same way as the database.

James recommends Git here.  I’m not Git’s biggest fan, but it’s much, much better than not having any source control at all.

Comments closed

The Basics Of Notebooks

I have a quick walkthrough of notebooks:

Remember chemistry class in high school or college?  You might remember having to keep a lab notebook for your experiments.  The purpose of this notebook was two-fold:  first, so you could remember what you did and why you did each step; second, so others could repeat what you did.  A well-done lab notebook has all you need to replicate an experiment, and independent replication is a huge part of what makes hard sciences “hard.”

Take that concept and apply it to statistical analysis of data, and you get the type of notebook I’m talking about here.  You start with a data set, perform cleansing activities, potentially prune elements (e.g., getting rid of rows with missing values), calculate descriptive statistics, and apply models to the data set.

I didn’t realize just how useful notebooks were until I started using them regularly.

Comments closed

Going From Pig To Spark

Philippe de Cuzey introduces Spark to people already familiar with Pig:

I like to think of Pig as a high-level Map/Reduce commands pipeline. As a former SQL programmer, I find it quite intuitive, and at my organization our Hadoop jobs are still mostly developed in Pig.

Pig has a lot of qualities: it is stable, scales very well, and integrates natively with the Hive metastore HCatalog. By describing each step atomically, it minimizes conceptual bugs that you often find in complicated SQL code.

But sometimes, Pig has some limitations that makes it a poor programming paradigm to fit your needs.

Philippe includes a couple of examples in Pig, PySpark, and SparkSQL.  Even if you aren’t familiar with Pig, this is a good article to help familiarize yourself with Spark.

Comments closed

Comparing Nullable Columns

Daniel Hutmacher shows an elegant way to compare multiple nullable columns on two tables:

Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.

Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.

This is an interesting use of INTERSECT.  Check it out.

Comments closed

Shortest Paths

Sanjay Mishra and Arvind Shaymsundar tie R to SQL Server to solve a pathfinding problem faster:

In such a case, if a developer were to implement Dijkstra’s algorithm to compute the shortest path within the database using T-SQL, then they could use approaches like the one at Hans Oslov’s blog. Hans offers a clever implementation using recursive CTEs, which functionally does the job well. This is a fairly complex problem for the T-SQL language, and Hans’ implementation does a great job of modelling a graph data structure in T-SQL. However, given that T-SQL is mostly a transaction and query processing language, this implementation isn’t very performant, as you can see below.

The important thing to remember is that these technologies tend to complement each other rather than supplant them.

Comments closed

Azure SQL Data Warehouse GA

James Serra notes that Azure SQL Data Warehouse is now generally available:

In brief, SQL DW is a fully managed data-warehouse-as-a-service that you can provision in minutes and scale up in seconds.  With SQL DW, storage and compute scale independently.  You can dynamically deploy, grow, shrink, and even pause compute, allowing for cost savings.  Also, SQL DW uses the power and familiarity of T-SQL so you can integrate query results across relational data in your data warehouse and non-relational data in Azure blob storage or Hadoop using PolyBase.  SQL DW offers an availability SLA of 99.9%, the only public cloud data warehouse service that offers an availability SLA to customers.

The pricing calculator now reflects GA prices.

Comments closed

Desired Enhancements

Jason Brimhall has a list of some nice SQL Server updates and bug fixes, followed by some things he’d like to see:

Digging a little deeper on this one. I would really love to see an enhancement to Resource Governor. Not just any enhancement will do. I need it to be enhanced so it will also affect the reporting services engine and the integration services engine in addition to the database engine. I want to be able to use RG to prevent certain reports from over consuming resources within the SSRS engine. Or for that matter, I want to make sure certain SSIS packages do not consume too much memory. If I can implement constraints on resources for these two engines it would be a huge improvement.

Check it out.

Comments closed

VoltDB

Kyle Kingsbury looks at VoltDB:

Unlike most SQL databases, which default to weaker isolation levels for performance reasons, VoltDB chooses to provide strong serializable isolation by default: the combination of serializability’s multi-object atomicity, and linearizability’s real-time constraints.

Serializability is the strongest of the four ANSI SQL isolation levels: transactions must appear to execute in some order, one at a time. It prohibits a number of consistency anomalies, including lost updates, dirty reads, fuzzy reads, and phantoms.

If you use VoltDB, it sounds like upgrading to 6.4 is a good idea.

Comments closed

Out With SQLPS

Mike Fal casts out SQLPS and looks at new and updated Powershell cmdlets:

An important facet of the provider is that it behaves like a file system. How many of us have deleted files from the command line? Have you also used a PowerShell one-liner to delete old files, like backups? If you have, you might be familiar with two parameters: -WhatIf and -Confirm. These two switch parameters are extremely helpful because they can keep you from cutting yourself with that sharp PowerShell knife.

With the July 2016 update, the SQL Server provider now supports the use of these two switches. While using them may not be a common situation, it’s good to know that they are there. After all, it could be handy if you wanted to clean up some items from the command line, like maybe a junk database. Now you can both check what you’re going to do before you do it, along with getting a confirmation question when you go for the actual delete:

I am a huge fan of the -WhatIf switch, so that gets a thumbs up from me.

Comments closed