SQL Server Graph Database

The SQL Server team announces graph extensions in SQL Server 2017:

Graph extensions are fully integrated in the SQL Server engine. Node and edge tables are just new types of tables in the database. The same storage engine, metadata, query processor, etc., is used to store and query graph data. All security and compliance features are also supported. Other cutting-edge technologies like columnstore, ML using R Services, HA, and more can also be combined with graph capabilities to achieve more. Since graphs are fully integrated in the engine, users can query across their relational and graph data in a single system.

This is interesting.  One concern I have had with graph databases is that graphs are storing the same information as relations but in a manner which requires two distinct constructs (nodes and edges) versus one (relations).  This seems to be a hybrid approach, where the data is stored as a single construct (relations) but additional syntax elements allow you to query the data in a more graph-friendly manner.  I have to wonder how it will perform in a production scenario compared to Neo4j or Giraph.

Comparing Tables With Powershell

Shane O’Neill compares table columns with T-SQL and Powershell:

That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.

There’s syntax for this…PIVOT

This is helpful for normalizing a bunch of wide, related tables into a subclass/superclass pattern.

When Binomials Converge

Mala Mahadevan shows an example of the central limit theorem in action, as a large enough sample from a binomial distribution approximates the normal:

An easier way to do it is to use the normal distribution, or central limit theorem. My post on the theorem illustrates that a sample will follow normal distribution if the sample size is large enough. We will use that as well as the rules around determining probabilities in a normal distribution, to arrive at the probability in this case.
Problem: I have a group of 100 friends who are smokers.  The probability of a random smoker having lung disease is 0.3. What are chances that a maximum of 35 people wind up with lung disease?

Click through for the example.

Linear Regression In SQL

Phil Factor shows how to generate a quick linear regression using SQL, Powershell, and Gnuplot:

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It’s good to know that this is possible, but I’d switch to R or Python long before.

Trailing Space Behavior

Kevin Feasel



Brent Ozar shows how SQL Server deals with trailing spaces in columns:

The mind-blowing part to me is the <> operator – that seemed utterly crazy to me.

And if you add another table, and join to it, guess what happens:

The answer might be a bit surprising.

Measuring Correlation In SQL

Phil Factor shows how to calculate Kendall’s Tau and Spearman’s Rho in SQL:

Kendall’s Tau rank correlation is a handy way of determining how correlated two variables are, and whether this is more than chance. If you just want a measure of the correlation then you don’t have to assume very much about the distribution of the variables. Kendall’s Tau is popular with calculating correlations with non-parametric data. Spearman’s Rho is possibly more popular for the purpose, but Kendall’s tau has a distribution with better statistical properties (the sample estimate is close to a population variance) so confidence levels are more reliable, but in general, Kendall’s tau and Spearman’s rank correlation coefficient are very similar. The obvious difference between them is that, for the standard method of calculation,  Spearman’s Rank correlation required ranked data as input, whereas the algorithm to calculate Kendall’s Tau does this for you.  Kendall’s Tau consumes any non-parametric data with equal relish.

Kendall’s Tau is easy to calculate on paper, and makes intuitive sense. It deals with the probabilities of observing the agreeable (concordant) and non-agreeable (discordant) pairs of rankings. All observations are paired with each of the others, A concordant pair is one whose members of one observation are both larger than their respective members of the other paired observation, whereas discordant pairs have numbers that differ in opposite directions. Kendall’s Tau-b takes tied rankings into account.

I appreciate Phil putting this series together.  I’d probably stick with R, but it’s good to have options.

Trick Co-Workers With This Extended Property

Kenneth Fisher shows how to use extended properties to hide a table from SQL Server Management Studio:

FYI I’ve tried this at the column and schema levels and it didn’t work.

Using this you can hide the object from SSMS object explorer without restricting its use in any way.

I’m curious if there are any other hidden uses of extended properties. I haven’t been able to find any documentation so if you’ve seen any please let me know!

I don’t think I’ve ever had cause to hide objects from Management Studio, but if you’re looking for next year’s April Fools prank, maybe?

Using OUTPUT To Get Change Counts

Manoj Pandey shows how to use the OUTPUT clause to determine the number of records inserted, updated, or deleted after a DML statement:

–> Question:

How can I get the numbers of records affected in the Merge statement, INSERT,UPDATE,DELETE separately and store it in a variable so I can get it in the application side?

Thanks !
–> My Answer:

You need to use OUTPUT clause with MERGE statement

Click through for a code sample.  The OUTPUT clause also works for non-MERGE statements like INSERT, UPDATE, and DELETE, though the “get changes by type” problem is really limited to the MERGE statement.

Don’t Hard-Code Values

Kevin Feasel



Jana Sattainathan argues against hard-coding values in queries:

I have heard arguments for doing this type source code

  • This is a one-time thing. We do not have the need to do it anywhere else

  • We are on a deadline

  • We do not have the ability to test if this was not done this way

  • My program is going away in a week

  • We do not have the time to correct this

  • I am just following the existing pattern

  • Unofficially (not) said – “This is my job security”

I’m with Jana in principle, but there are performance costs at the margin, making this less of a hard-and-fast rule than I’d like.

Random Number Lists

Kevin Feasel



Kenneth Fisher one method of creating lists of random numbers:

I was working on a blog post this weekend that required a list of random numbers. Now, this isn’t exactly an uncommon task, but it’s not as easy as it would seem. The random number function RAND() has a few inherent flaws. And from what I can remember the random functions from most languages have the same issue.

First a few quotes from BOL about RAND()

Returns a pseudo-random float value from 0 through 1, exclusive.

Note: pseudo-random

If a seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.

If you don’t specify the seed it gets selected at random. But that’s only once per query run, not once per row of the output.

Read on for Kenneth’s solution.


May 2017
« Apr