Using Spark For Investigation

Sean Owen tries to unravel the Tamam Shud mystery:

Several people have approached these letters as a cryptographic cipher. The odd circumstances of death do sound like something out of a John Le Carré spy novel. Some of the best attempts, however, fail to produce anything but truly convoluted parsings.

Another possibility may already have occurred to you: Are they the first letters of words in a sentence (aninitialism)? Some suspect this death was a suicide, and that the message is merely some form of final note. With this morbid scenario in mind, it’s easy to imagine many phrases, like “My Life Is All But Over,” that fit the letters because indeed their frequency seems to match that of English text.

This lead has been picked up a few times. These writeups (example) present indications that the message is indeed an initialism. However, they don’t apply what is arguably the clear statistical tool for this job. And they don’t take advantage of big data. So, let’s do both.

Read on for Chi Square testing and book parsing examples using Spark.  Spoiler alert:  Sean doesn’t solve the mystery, but it’s still a fun read.

Predictive Maintenance Solution Template

Kevin Feasel



Jaya Mathew has a SQL Server R Services template for predictive maintenance:

To illustrate the scenario, we will focus on companies who operate machines which encounter mechanical failures. These failures lead to downtime which has cost implications on any business, hence most companies are interested in predicting the failures ahead of time so that they can proactively prevent them. This scenario is aligned with an existing R Notebook published in the Cortana Intelligence Gallery but works with a larger dataset where we will focus on predicting component failures of a machine using raw telemetry, maintenance logs, previous errors/failures and additional information about the make/model of the machine. This scenario is widely applicable for almost any industry which uses machines that need maintenance. A quick overview of typical feature engineering techniques as well as how to build a model will be discussed below.

Understanding when machines are likely to break down is a very interesting statistical problem.  Check out the template.

Creating BACPAC Files

Kenneth Fisher needs a new BACPAC:

Why are we talking about it?

Well there are two reasons. First because I’m studying how to move databases from SQL Server to Azure SQL Database and back. My first blog on the subject was using the Deploy Database to Microsoft Azure SQL Database option to move a SQL Server database to Azure SQL Database.

Kenneth shows you how to do this through the UI as well as through SqlPackage.exe.

Service Fabric On Linux

Mark Russinovich announces that Azure Service Fabric will be available on Linux:

Given its beginnings, Service Fabric supports Windows servers and .NET applications, but many enterprises today run heterogeneous workloads, including Windows and Linux servers, .Net and Java applications, and SQL and NoSQL databases. That’s why I am excited to announce today that the preview of Service Fabric for Linux will be publicly available at our Ignite conference on September 26.  With today’s announcement customers can now provision Service Fabric clusters in Azure using Linux as the host operating system and deploy Java applications to Service Fabric clusters. Service Fabric on Linux will initially be available for Ubuntu, with support for RHEL coming soon.

This isn’t a huge announcement for many people, but it’s a positive sign.

Thinking About Azure SQL Database

Kevin Hill with an introductory-level discussion of Azure SQL Database:

Some basic terminology:

  • Cloud: No such thing.  It is just your stuff on someone else’s machines that they maintain for you.

  • Virtual Machine (VM): A Virtual Server on some physical servers…yours, or someone else’s.

  • Azure: Fancy name for Microsoft’s cloud. As a noun or an adverb it means “blue”.  Or a small butterfly.

  • Azure SQL database: Just a database in Azure on some storage

  • Azure Virtual Machine: A VM on Microsoft’s Azure servers, that you do not have to maintain the underlying physical infrastructure.

This is a nice, very high-level introduction to why Azure SQL Database exists.

Chi Square Tests

Mala Mahadevan discusses how to perform a Chi Square test:

For any dataset to lend itself to the Chi Square test it has to fit the following conditions  –

1 Both  variables are categorical (in this case – exposure to smoking – yes/no, and health condition – sick/not sick are both categorical).
2 Researchers used a random sample to collect data.
3 Researchers had an adequate sample size.Generally the sample size should be at least 100.
4 The number of respondents in each cell should be at least 5.

This is an easy case for using R over T-SQL—the Chi Square test is built in, whereas you have to roll your own T-SQL code.  Mala does show you how to do this from within SQL Server R Services as well.

WideWorldImporters Deadlocks

Kendra Little has a couple queries to force deadlocks in the WideWorldImporters database:

SQL Server’s deadlock manager woke up, looked around, and saw that our two session windows were stuck. They each were requesting locks that the other session wouldn’t give up– and if the deadlock manager didn’t break the deadlock, they’d be stuck there forever.

I didn’t set the deadlock priority on any of my transactions, so the deadlock manager picked the session that it thought would be the least work to roll back– and it became the victim.

Read on for the scripts and also some hints to help you learn more about deadlocks.

Paste The Plan

Brent Ozar introduces a new website:

It’s a free community service. We hope you love it as much as we do, especially all the cool new execution plan icons by our illustrator, Eric Larsen.

Down the road, we’re thinking about adding logins (so you can see past plans you’ve submitted), execution plan advice, image and HTML downloads (so you can embed the plan in your own blog or report), zooming, and more.

Check out the Paste The Plan website.

After Triggers On Memory-Optimized Tables

David Klee shows that you can put an AFTER trigger on a memory-optimized table in SQL Server 2016:

SQL Server 2016 supports AFTER triggers! I could find no good example of how to do this with a project I’m working on, so I figured out how to make this work. The following before and after screenshots are from a SQL Server 2016 instance and the DVDStore3 package. I modified a trigger to work with In-Memory OLTP.

Click through to see the code.

Finding Cumulative Updates

Jonathan Kehayias shows where to find older cumulative updates:

However, to my dismay that link took me to the download link for SQL Server 2014 SP1 CU8 and not to CU7 as I was expecting. So I tried other SP1 CU links and they all redirect to the CU8 download page. Slightly irritated I decided to send a Skype IM to Glenn (@GlennAlanBerry|Blog) what was going on because he keeps up with every update Microsoft publishes for SQL Server. Glenn told me that Microsoft’s recommendation and preference is for people to install the latest CU so the KB articles now link to the latest CU download page. That doesn’t really help me with adding a failed FCI node back into a FCI that is on CU7, so Glenn offered to share the CU7 file with me by Dropbox.

It turns out that I didn’t need Glenn to share the file with me, I needed to read the information on the KB article closer and pay attention.

Read on to find out where you can find the CU archives.


September 2016
« Aug Oct »