Press "Enter" to skip to content

Curated SQL Posts

Visualizing Rating Data

Stephanie Evergreen shows various ways to visualize ratings data:

Stacked Bars *seem* like a good idea – we show 100%, we can fit more questions and data into a similar amount of space – advantages, right? Except that stacked bars are difficult for people to read. How well can you compare the values of the orange segments? Not so much.

If you are going to use stacked bars, make some helpful formatting tweaks, like smarter color coding and an order from greatest to least.

A lot of this comes down to simplification and reduction of possibilities. Read the whole thing.

Comments closed

DISTINCT is not a Function

Lukas Eder corrects a misconception:

A very common misconception I often encounter with SQL users is the idea that DISTINCT is something like a function, and that it can take parenthesised arguments. Just recently, I’ve seen this Stack Overflow question where the OP was looking for a way to express this in jOOQ:

SELECT DISTINCT (emp.id), emp.fname, emp.name FROM employee emp;

Notice the parentheses around (emp.id), which look as though this is some special kind of DISTINCT usage, which is akin to a DISTINCT function. The idea is often that:

– The behaviour is somewhat different from omitting the parentheses
– The performance is faster, because only the ID needs to be considered for distinctness

Both of these ideas are (mostly) wrong, as Lukas shows.

Comments closed

Metadata-Driven ADF Pipelines

Paul Andrew wraps up a series on metadata-driven processing of Azure Data Factory pipelines. Part 3 covers the ADF wrapper necessary for our custom pipelines:

Firstly, to help guide this post below is a mock up of our Data Factory pipelines and activities to show the end goal. Hopefully this view informs how things are going to be connected using what I call a pipeline hierarchy system and how they will work in the overall framework. For our metadata processing framework we can make the following category distinctions about the activities represented:

Grandparent – This is the top level orchestration of our wider data platform solution. Here a scheduled trigger could be connected or processing in our solution grouped into natural areas. Technically this level isn’t required for our processing framework, but I’ve included it as good practice.
Parent – Our parent pipelines primary purpose (try saying that fast 3 times 🙂 ) is to handle the stages of our processing framework. The stages will then be passed off sequentially to our child pipeline using another execute pipeline activity.
Child – At this level in the framework the child is hitting the Azure Function to call the lowest level executors, or the pipelines that we want to actually do the work in our data platform solution. In my previous post I added some example metadata to call pipelines name Stage X-X. These in turn relate to a set of empty place holder pipelines that contained only Wait activities.

Part 4 puts it all together:

For the end to end run of the framework we have a few options to see progress once its been triggered. Before that its worth pointing out that in the below I’ve used the sample metadata provided with the database scripts in GitHub. Then for each execution pipeline I’ve added a single Wait activity with a random time delay of a few seconds. The point here is to test the framework execution, not the pipelines being called. To further clarify, the Stage X-X pipelines names should be replaced with your actual pipeline names in your data platform solution.

Definitely worth the read.

Comments closed

Syncing Mobile Apps via Change Tracking

Davide Mauri shows how we can perform data synchronization using change tracking in Azure:

Sending data from the cloud to the app is way more tricky. You want to do it in the most efficient way, to spare bandwidth and device battery life, so you need a way to know what has changed since the last time that specific user and device synced. As data is surely stored in a database of some sort, you also need some efficient method on the database side to make sure you can quickly get everything that is new or changed and that is in the scope for that specific user/device. If your mobile application is successful, this means that you may literally have millions and millions of rows or documents to scan and check for changes.

Not an easy task: all hope is lost then? Just send back the whole data set and that’s it? Of course not! We don’t want to just be developers, but better developers, right?

Modern databases can help a lot in tackling this challenge. Azure SQL, for example, has a feature called Change Tracking that, guess what?, will take care of keeping track of changes for you.

Davide includes a lot of detail and even a sample application on GitHub.

Comments closed

Currency Conversion in Power BI

Marco Russo takes us through some tips when performing currency conversions using Power BI:

The rule of thumb is to apply the currency exchange conversion upfront. Therefore, it is a good idea to solve all the scenarios requiring a single currency in the report by converting all amounts at the time of data import into the data model. When you need to create a report in multiple currencies, computing data in advance could be challenging and expensive. Therefore, a dynamic solution based on DAX measures and a properly designed data model makes more sense.

In this article, we only consider the third scenario, “Data in a single currency, report with multiple currencies”. The second scenario could be implemented by transforming data so that it is imported in the model in a single currency, moving the challenge over to the very scenario we describe in this article. An extended description of the three scenarios with other dynamic solutions is included in a chapter of the Analyzing Data with Microsoft Power BI and Power Pivot for Excel book.

This is quite a useful article if you work with multiple currencies.

Comments closed

Implicit Type Conversions with Spark SQL

Manoj Pandey walks us through an unexpected error with Spark SQL:

While working on some data analysis I saw one Spark SQL query was not getting me expected results. The table had some good amount of data, I was filtering on a value but some records were missing. So, I checked online and found that Spark SQL works differently compared to SQL Server, in this case while comparing 2 different datatypes columns or variables.

Read on to learn more about the issue. This is the downside of Feasel’s Law: just because both system interfaces are SQL doesn’t mean that they’re equivalent or that the assertions and assumptions you can make for one follow through to the next.

Comments closed

Managing Azure SQL Database with Golang

Silvano Coriani shows us that it’s possible to manage Azure SQL Databases with Go:

Goal for this post is to introduce how to start interacting with Azure SQL through Go (https://golang.org/), an open source programming language gaining lots of traction in developers’ community thanks to its simplicity and efficiency in scenarios like microservices and server apps (did I mention that Kubernetes itself is written in Go?).  

Azure SQL provides full support for Go developers on both control plane (deploy, manage and configure Azure SQL servers and databases) and data plane activities (connect, execute commands and queries against Azure SQL instances) through Azure SDK for Go and Microsoft SQL Server Driver for Go.

A companion code sample for this article, written using VS Code, can be found here.

To me, this is like the articles written back in 2007-2008 about how to manage SQL Server with Python. It’s cool and I hope it helps some people learn the language or find another good use for it, but I’m not sure it moves the needle.

Comments closed

Troubleshooting Azure SQL DB Elastic Jobs

Kate Smith wraps up a series on elastic jobs in Azure SQL Database:

This error means that the Elastic Job Agent cannot connect to the target server(s) because the target has some firewall rules blocking the connection requests.  Indeed – it is required that every target in the target group allows connections from Azure Services in order for Elastic Jobs to work.  To fix this, I go to the target server in the Azure Portal and click on the “Firewalls and virtual networks” item under “Security”.  Next, I toggle the “Allow Azure services” from OFF to ON, and save my changes.  

This has been an interesting series to read through, even though I don’t do much at all with Azure SQL Database.

Comments closed