Press "Enter" to skip to content

Category: T-SQL

Generating Comma-Delimited Strings

Shane O’Neill has a good reason to upgrade to SQL Server 2017, which is not having to deal with FOR XML PATH hacks anymore:

Commas are all the rage nowadays:

There are a fair number of questions nowadays about returning data from a database in a comma separated string. Sure the application should probably do that but hey, database servers are expensive, why not get some bang for your bucks!

Protip:  I use the colon as a separator because sometimes commas will get stuck in the Ethernet cable.

Comments closed

Understanding CROSS APPLY

Andy Levy has a T-SQL programming breakthrough:

Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).

I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.

The APPLY operator is extremely powerful in the right set of circumstances.  Andy shows the “classic” use case, but there are a number of other uses for the operator.

Comments closed

T-SQL FizzBuzz Golf

Kenneth Fisher plays the FizzBuzz game with T-SQL and his chat piles on:

Now normally FizzBuzz is done with a loop, but as Russ said, we are using T-SQL so batch code is always the goal. That said, what table should I query to get the numbers 1-100? Well, I decided I’d just do something simple here and use a system view that’s more than 100 rows, the ROW_NUMBER function, and restrict it using TOP.

Read on to see several answers to this problem, some better than others.

Comments closed

SQL Server Query Options

John Morehouse shows how to set your query options to be the same as that problematic application which is generating nasty queries:

Have you every executed a query in SQL Server Management Studio, looked at the execution plan, and noticed that it was a different plan than what was generated on the server?

A potential reason for this could be a different option settings.  The options represent the SET values of the current session.  SET options can affect how the query is execute thus having a different execution plan.   You can find these options in two places within SSMS under Tools -> Options -> Query Execution -> SQL Server -> Advanced.

Click through for lots of information including a script John provided to see which options are currently on.

Comments closed

Creating Graph Objects In SQL Server

Steve Jones creates a simple graph relationship in SQL Server 2017:

What does all that mean? No idea. Clearly there is JSON that’s returned here and can be deserialized to gather meanings. Is this useful? I think graphs solve a certain set of problems very well, and more efficiently than relational systems. Certainly I could implement a graph structure relationally, but at scale I’m not sure the queries would be as easy to write or run as quickly.

I don’t know if I’d use a graph structure in any of the problems we try to solve in the SQLServerCentral app, but who knows. Maybe we would if we could.

Steve leaves this with more questions than answers, but he does give a very simple repro script if you want to futz about with graphs.

Comments closed

The Birthday Problem

Mala Mahadevan explains the Birthday problem and demonstrates it with SQL and R:

Given a room of 23 random people, what are chances that two or more of them have the same birthday? 

This problem is a little different from the earlier ones, where we actually knew what the probability in each situation was.

What are chances that two people do NOT share the same birthday? Let us exclude leap years for now..chances that two people do not share the same birthday is 364/365, since one person’s birthday is already a given. In a group of 23 people, there are 253 possible pairs (23*22)/2. So the chances of no two people sharing a birthday is 364/365 multiplied 253 times. The chances of two people sharing a birthday, then, per basics of probability, is 1 – this.

The funny thing for me is that I’ve had the Birthday problem explained three separate times using as a demo the 20-30 people in the classroom.  In none of those three cases was there a match, so although I understand that it is correct and how it is correct, the 100% failure to replicate led a little nagging voice in the back of my mind to discount it.

Comments closed

Building A Recommendation System With Graph Data

Arvind Shyamsundar and Shreya Verma show how to implement a recommender system using SQL Server 2017’s new graph database functionality:

As you can see from the animation, the algorithm is quite simple:

  • First, we identify the user and ‘current’ song to start with (red line)
  • Next, we identify the other users who have also listened to this song (green line)
  • Then we find the other songs which those other users have also listened to (blue, dotted line)
  • Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)

The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.

Click through for animated images as well as an actual execution plan and recommendations for graph query optimization (spoilers:  columnstore all the things).  They also link to the GitHub project where you can try it out yourself.

Comments closed

Graph Data In SQL Server

Terry McCann has a first look at SQL Server 2017’s graph data capabilities:

SQL Graph is a similar concept to what is described above, but built in to the core SQL Server engine. This means 2 new table types NODE and EDGE and a few new TSQL functions in particular MATCH(). SQL Graph at the time of writing is only available in SQL 2017 ctp 2.0. You can read more and download ctp2.0 here https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/sql-server-2017-community-technology-preview-2-0-now-available/. Once ctp 2.0 is installed there is nothing else you need to do to enable the new graph syntax and storage.

There is an example you can download from Microsoft which is a similar set up to the example in the image above. However I have used some real data shredded from IMDB the internet movie database. This data is available to download from Kaggle https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

Click through for a video demonstration as well.

Comments closed

SQL Server Graph Database

The SQL Server team announces graph extensions in SQL Server 2017:

Graph extensions are fully integrated in the SQL Server engine. Node and edge tables are just new types of tables in the database. The same storage engine, metadata, query processor, etc., is used to store and query graph data. All security and compliance features are also supported. Other cutting-edge technologies like columnstore, ML using R Services, HA, and more can also be combined with graph capabilities to achieve more. Since graphs are fully integrated in the engine, users can query across their relational and graph data in a single system.

This is interesting.  One concern I have had with graph databases is that graphs are storing the same information as relations but in a manner which requires two distinct constructs (nodes and edges) versus one (relations).  This seems to be a hybrid approach, where the data is stored as a single construct (relations) but additional syntax elements allow you to query the data in a more graph-friendly manner.  I have to wonder how it will perform in a production scenario compared to Neo4j or Giraph.

Comments closed