Press "Enter" to skip to content

Category: T-SQL

Creating Graph Objects In SQL Server

Steve Jones creates a simple graph relationship in SQL Server 2017:

What does all that mean? No idea. Clearly there is JSON that’s returned here and can be deserialized to gather meanings. Is this useful? I think graphs solve a certain set of problems very well, and more efficiently than relational systems. Certainly I could implement a graph structure relationally, but at scale I’m not sure the queries would be as easy to write or run as quickly.

I don’t know if I’d use a graph structure in any of the problems we try to solve in the SQLServerCentral app, but who knows. Maybe we would if we could.

Steve leaves this with more questions than answers, but he does give a very simple repro script if you want to futz about with graphs.

Comments closed

The Birthday Problem

Mala Mahadevan explains the Birthday problem and demonstrates it with SQL and R:

Given a room of 23 random people, what are chances that two or more of them have the same birthday? 

This problem is a little different from the earlier ones, where we actually knew what the probability in each situation was.

What are chances that two people do NOT share the same birthday? Let us exclude leap years for now..chances that two people do not share the same birthday is 364/365, since one person’s birthday is already a given. In a group of 23 people, there are 253 possible pairs (23*22)/2. So the chances of no two people sharing a birthday is 364/365 multiplied 253 times. The chances of two people sharing a birthday, then, per basics of probability, is 1 – this.

The funny thing for me is that I’ve had the Birthday problem explained three separate times using as a demo the 20-30 people in the classroom.  In none of those three cases was there a match, so although I understand that it is correct and how it is correct, the 100% failure to replicate led a little nagging voice in the back of my mind to discount it.

Comments closed

Building A Recommendation System With Graph Data

Arvind Shyamsundar and Shreya Verma show how to implement a recommender system using SQL Server 2017’s new graph database functionality:

As you can see from the animation, the algorithm is quite simple:

  • First, we identify the user and ‘current’ song to start with (red line)
  • Next, we identify the other users who have also listened to this song (green line)
  • Then we find the other songs which those other users have also listened to (blue, dotted line)
  • Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)

The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.

Click through for animated images as well as an actual execution plan and recommendations for graph query optimization (spoilers:  columnstore all the things).  They also link to the GitHub project where you can try it out yourself.

Comments closed

Graph Data In SQL Server

Terry McCann has a first look at SQL Server 2017’s graph data capabilities:

SQL Graph is a similar concept to what is described above, but built in to the core SQL Server engine. This means 2 new table types NODE and EDGE and a few new TSQL functions in particular MATCH(). SQL Graph at the time of writing is only available in SQL 2017 ctp 2.0. You can read more and download ctp2.0 here https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/sql-server-2017-community-technology-preview-2-0-now-available/. Once ctp 2.0 is installed there is nothing else you need to do to enable the new graph syntax and storage.

There is an example you can download from Microsoft which is a similar set up to the example in the image above. However I have used some real data shredded from IMDB the internet movie database. This data is available to download from Kaggle https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

Click through for a video demonstration as well.

Comments closed

SQL Server Graph Database

The SQL Server team announces graph extensions in SQL Server 2017:

Graph extensions are fully integrated in the SQL Server engine. Node and edge tables are just new types of tables in the database. The same storage engine, metadata, query processor, etc., is used to store and query graph data. All security and compliance features are also supported. Other cutting-edge technologies like columnstore, ML using R Services, HA, and more can also be combined with graph capabilities to achieve more. Since graphs are fully integrated in the engine, users can query across their relational and graph data in a single system.

This is interesting.  One concern I have had with graph databases is that graphs are storing the same information as relations but in a manner which requires two distinct constructs (nodes and edges) versus one (relations).  This seems to be a hybrid approach, where the data is stored as a single construct (relations) but additional syntax elements allow you to query the data in a more graph-friendly manner.  I have to wonder how it will perform in a production scenario compared to Neo4j or Giraph.

Comments closed

Comparing Tables With Powershell

Shane O’Neill compares table columns with T-SQL and Powershell:

That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.

There’s syntax for this…PIVOT

This is helpful for normalizing a bunch of wide, related tables into a subclass/superclass pattern.

Comments closed

When Binomials Converge

Mala Mahadevan shows an example of the central limit theorem in action, as a large enough sample from a binomial distribution approximates the normal:

An easier way to do it is to use the normal distribution, or central limit theorem. My post on the theorem illustrates that a sample will follow normal distribution if the sample size is large enough. We will use that as well as the rules around determining probabilities in a normal distribution, to arrive at the probability in this case.
Problem: I have a group of 100 friends who are smokers.  The probability of a random smoker having lung disease is 0.3. What are chances that a maximum of 35 people wind up with lung disease?

Click through for the example.

Comments closed

Linear Regression In SQL

Phil Factor shows how to generate a quick linear regression using SQL, Powershell, and Gnuplot:

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It’s good to know that this is possible, but I’d switch to R or Python long before.

Comments closed