Now normally FizzBuzz is done with a loop, but as Russ said, we are using T-SQL so batch code is always the goal. That said, what table should I query to get the numbers 1-100? Well, I decided I’d just do something simple here and use a system view that’s more than 100 rows, the ROW_NUMBER function, and restrict it using TOP.
Read on to see several answers to this problem, some better than others.
Have you every executed a query in SQL Server Management Studio, looked at the execution plan, and noticed that it was a different plan than what was generated on the server?
A potential reason for this could be a different option settings. The options represent the SET values of the current session. SET options can affect how the query is execute thus having a different execution plan. You can find these options in two places within SSMS under Tools -> Options -> Query Execution -> SQL Server -> Advanced.
Click through for lots of information including a script John provided to see which options are currently on.
What does all that mean? No idea. Clearly there is JSON that’s returned here and can be deserialized to gather meanings. Is this useful? I think graphs solve a certain set of problems very well, and more efficiently than relational systems. Certainly I could implement a graph structure relationally, but at scale I’m not sure the queries would be as easy to write or run as quickly.
I don’t know if I’d use a graph structure in any of the problems we try to solve in the SQLServerCentral app, but who knows. Maybe we would if we could.
Steve leaves this with more questions than answers, but he does give a very simple repro script if you want to futz about with graphs.
Basically, it escapes any occurrence of the second parameter within the first parameter. So when would we be using it in dynamic SQL? Well, probably the most common way I’ve used it is when I’m building a list of commands I want to run.
Click through for more details, including valid quote characters.
Given a room of 23 random people, what are chances that two or more of them have the same birthday?
This problem is a little different from the earlier ones, where we actually knew what the probability in each situation was.
What are chances that two people do NOT share the same birthday? Let us exclude leap years for now..chances that two people do not share the same birthday is 364/365, since one person’s birthday is already a given. In a group of 23 people, there are 253 possible pairs (23*22)/2. So the chances of no two people sharing a birthday is 364/365 multiplied 253 times. The chances of two people sharing a birthday, then, per basics of probability, is 1 – this.
The funny thing for me is that I’ve had the Birthday problem explained three separate times using as a demo the 20-30 people in the classroom. In none of those three cases was there a match, so although I understand that it is correct and how it is correct, the 100% failure to replicate led a little nagging voice in the back of my mind to discount it.
As you can see from the animation, the algorithm is quite simple:
- First, we identify the user and ‘current’ song to start with (red line)
- Next, we identify the other users who have also listened to this song (green line)
- Then we find the other songs which those other users have also listened to (blue, dotted line)
- Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)
The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.
Click through for animated images as well as an actual execution plan and recommendations for graph query optimization (spoilers: columnstore all the things). They also link to the GitHub project where you can try it out yourself.
SQL Graph is a similar concept to what is described above, but built in to the core SQL Server engine. This means 2 new table types NODE and EDGE and a few new TSQL functions in particular MATCH(). SQL Graph at the time of writing is only available in SQL 2017 ctp 2.0. You can read more and download ctp2.0 here https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/sql-server-2017-community-technology-preview-2-0-now-available/. Once ctp 2.0 is installed there is nothing else you need to do to enable the new graph syntax and storage.
There is an example you can download from Microsoft which is a similar set up to the example in the image above. However I have used some real data shredded from IMDB the internet movie database. This data is available to download from Kaggle https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset
Click through for a video demonstration as well.
Graph extensions are fully integrated in the SQL Server engine. Node and edge tables are just new types of tables in the database. The same storage engine, metadata, query processor, etc., is used to store and query graph data. All security and compliance features are also supported. Other cutting-edge technologies like columnstore, ML using R Services, HA, and more can also be combined with graph capabilities to achieve more. Since graphs are fully integrated in the engine, users can query across their relational and graph data in a single system.
This is interesting. One concern I have had with graph databases is that graphs are storing the same information as relations but in a manner which requires two distinct constructs (nodes and edges) versus one (relations). This seems to be a hybrid approach, where the data is stored as a single construct (relations) but additional syntax elements allow you to query the data in a more graph-friendly manner. I have to wonder how it will perform in a production scenario compared to Neo4j or Giraph.
That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.
There’s syntax for this…
This is helpful for normalizing a bunch of wide, related tables into a subclass/superclass pattern.
An easier way to do it is to use the normal distribution, or central limit theorem. My post on the theorem illustrates that a sample will follow normal distribution if the sample size is large enough. We will use that as well as the rules around determining probabilities in a normal distribution, to arrive at the probability in this case.
Problem: I have a group of 100 friends who are smokers. The probability of a random smoker having lung disease is 0.3. What are chances that a maximum of 35 people wind up with lung disease?
Click through for the example.