Press "Enter" to skip to content

Curated SQL Posts

Shortest Path in Graphs

Mala Mahadevan looks at the shortest path function in SQL Server Graph:

‘Shortest path’ is the term accorded to the shortest distance between any two points, referred to as nodes in graph databases. The algorithm that helps you find the shortest distance between node A and node B is called the Shortest Path Algorithm.

Let us go back to the movie database. We have two people, say Amrish Puri and Harrison Ford. Amrish wants to meet Harrison Ford. He has not acted with Ford, he may have a few connections in common – or people who know him. Or people who know him who know him. This is one way to get an introduction. Or, let us say you are interviewing for a job. You want to see if someone in your network works at that place – so that you can get an idea of what the job or the company is like. So you go on linkedin – do a search for the company, look under ‘people’, and it tells you if anyone in your network is there, or someone is 2 levels away, or 3. Those numbers are what we get from the shortest path feature.

Read on for a few examples of shortest path in action.

Comments closed

Divide, RankX, and N/A

Rob Collie has some fun with DIVIDE():

A blank cell in a report is sometimes a source of confusion for those human beings consuming our work. “What does a blank cell mean,” they ask.  “It’s a division by zero,” we reply.  “Wut,” they then ask.  “Trust me,” we say, “you don’t want to see the alternative.”  “But I don’t trust you, and now I don’t trust this whole report,” is what they sometimes say next – whether under their breath or out loud.

But “N/A” is a lovely value to display.  It raises far fewer eyebrows.  “Oh, it says our Profit Margin % for electric blankets sold in Cancun is “N/A” – I get it, we’ve never sold that product there.”  No convo required.

Click through for the full story.

Comments closed

A Primer on Survey Analysis

Federico Pascual has a long primer on survey analysis:

When it comes to customer feedback, you’ll find that not all the information you get is useful to your company. This feedback can be categorized into non-insightful and insightful data. The former refers to data you had already spotted as problematic, while insightful information either helps you confirm your hypotheses or notice new issues or opportunities. 

Let’s imagine your company carries out a customer satisfaction survey, and 60% of the respondents claim that the pricing of your product or service is too high. You can use that valuable data to make a decision. That’s why this data is also called actionable insights because they either lead to action, validation, or rethinking of a specific strategy you have already implemented. 

Survey design and implementation can be pretty difficult. This article does a good job pushing you away from some of the pitfalls around it.

Comments closed

Linear Regression in Power BI

Joseph Yeates shows how to implement linear regression in Power BI:

The goal of a simple linear model is to fit a line onto this plot to summarize the shape of the data using the equation above.

The “a” value is the slope of the fitted line (rise over run) and the “b” value is the intercept on the y-axis (when x is equal to zero).

In the gapminder example, the life expectancy column was assigned as the “y” variable, as it is the outcome that we are interested in predicting or understanding. The year1950 column was assigned as the “x” variable, as it is what we are using to try and measure the change in life expectancy.

This is a little more complicated than adding a regression line to a scatterplot (the “normal” way to do linear regression with Power BI) but this method lets you work with the outputs in a way that the normal method doesn’t.

Comments closed

Secret Sharing and Neural Networks

Adrian Colyer reviews an interesting paper:

Take a system trained to make predictions on a language (word or character) model – an example you’re probably familiar with is Google Smart Compose. Now feed it a prefix such as “My social security number is “. Can you guess what happens next?

Read the whole thing. There’s a bit of discussion at the end around how you can stop this learning of secrets.

Comments closed

Multi-Table, Multi-Column Keys in Power BI

Alexander Arvidsson has a riff on a common Power BI question:

The keys we need for connecting the two tables are “region” and “certification”, respectively. A key on just one of these columns won’t ensure uniqueness and here is hurdle number one: how do we create a relationship in Power BI that is based on more than one key? Simple answer is that we can’t. But what we CAN do is create a concatenated column with the data we need to create a unique key, and then do our relationships based on that. 

Click through for the process.

Comments closed

Forced Parameterization and Filtered Indexes

Aaron Bertrand walks us through a case where filtered indexes become unhelpful:

Again, focusing on the areas highlighted in orange: the statement has a parameter @0 (previously it had @1) but, more importantly, the clustered index is scanned now instead of the filtered index. This has impacts throughout the plan, including how many rows are both estimated to be read and actually read in order to return those 11 rows. You can see a much higher I/O cost (about 22X), the predicate is now listed explicitly in the tooltip, and you can see warnings about residual I/O (which just means a lot more rows were read than necessary). The root operator still has the warning about the unmatched index, so at least the plan gives you some clue that a filtered index exists that might be useful if you change the parameterization setting for the database (or add OPTION (RECOMPILE) to the statement):

There are still ways to make filtered indexes work with forced parameterization, such as index hints, but Aaron does a great job explaining why something which seems like it should just work doesn’t always.

Comments closed

Azure Kubernetes Service Max Volume Count

Chris Taylor explains an error message in Azure Kubernetes Service:

Whilst playing around with my session for Techorama.nl I encountered an error I hadn’t seen previously whilst deploying SQL Server on Linux in Azure Kubernetes Service (AKS)

0/1 nodes are available: 1 node(s) exceed max volume count

The yaml I used was only slightly modified (mainly names) from scripts used on minikube and docker-desktop so I was a little confused as to why I was getting this in AKS.

Read on to understand what’s happening here and how you can fix it.

Comments closed

Building Graph Queries with SQL Server Graph

Mala Mahadevan takes us through a few examples of queries in SQL Server’s built-in graph engine:

The main goal behind a graph design is to help you answer queries – so what are the questions you’d ask of a movie database, if you had one? Mine would typically be like below.

1 Who are the actors in this movie?
2 Who is this movie directed by?
3 Who is the most prolific actor, according this dataset?
4 How many actors are also directors?
..and so on.

Read on to see how you can write these queries.

Comments closed