Press "Enter" to skip to content

Curated SQL Posts

RANKX on Multiple Columns in DAX

Alberto Ferrari walks us through ranking based on multiple columns:

DAX offers the RANKX function to compute ranking over a table, based on measures or columns. One limitation of RANKX is that it is only capable of ranking using a single expression. Oftentimes it is necessary to use multiple columns to obtain a ranking, either because the business requirement dictates it, or because you want to rank ties with different criteria.

As a demonstration, we rank customers based on their purchase volume. To artificially introduce ties, we use the Rounded Sales measure, that rounds the sales amount to the nearest multiple of one thousand. Using Rounded Sales, several customers show the same amount of 10,000.00. Because they are ties, their ranking must now be defined by alphabetical order based on their names.

Read on for two methods to solve this problem.

Comments closed

Building a Backup Plan

Greg Larsen takes us through the steps of developing out a backup plan:

You might be wondering why you need to develop a backup plan. Can’t a DBA just implement a daily backup of each database and call it good? Well, that might work, but it doesn’t consider how an application uses a database. If you have a database that is only updated with a nightly batch process, then having a daily backup of the database right after the nightly update process might be all that you need. But what if you had a database that was updated all day long from some online internet application. If you have only one backup daily for a database that gets updated all day online, then you might lose up to a day’s worth of online transactions if it was to fail right before the next daily backup. Losing a day’s worth of transaction most likely would be unacceptable. Therefore, to ensure minimal data loss occurs when restoring a database, the backup and recovery requirements should be identified first before building a backup solution for a database.

The biggest non-secret here is that backup plans are ultimately business decisions rather than technical decisions. Greg then outlines several cases and provides considerations for each.

Comments closed

DAX Formatter for Power BI Desktop

Phil Seamark has a new tool for us:

Last week I was honoured to take part in the latest edition of the Power BI Dev Camp which is run by my colleague Ted Patterson. It was a fun session which I enjoyed.

As part of the Dev camp, I walked through some of my recent Visual Studio Code based blog posts on how to perform various tasks against models hosted in Power BI desktop.

While preparing for the session, Ted and I agreed that it might be helpful to create a small external tool that could automatically format all DAX expressions in a Power BI model. The idea is to leverage the excellent DAX Formatter API provided by the good folks at SQLBI. This API is the same endpoint used when you format your DAX using DAX Studio.

Read on for more details.

Comments closed

Delta Versus Lambda Architectures

Hector Leano compares the delta and lambda architectures:

Generally, a simple data architecture is preferable to a complex one. Code complexity increases points of failure, requires more compute to run jobs, adds latency, and increases the need for support. As a result, data pipeline performance degrades over time, increasing costs while decreasing productivity as your data engineers spend more time troubleshooting and downstream users wait longer for data refreshes.

Complexity was perceived as a necessary evil for the automated data pipelines feeding business reporting, SQL analytics, and data science because the traditional approach for bringing together batch and streaming data required a lambda architecture. While a lambda architecture can handle large volumes of batch and streaming data, it increases complexity by requiring different code bases for batch and streaming, along with its tendency to cause data loss and corruption. In response to these data reliability issues, the traditional data pipeline architecture adds even more complexity by adding steps like validation, reprocessing for job failures, and manual update & merge.

On the one hand, lambda was always intended to be a compromise architecture based on the tools of the time. On the other hand, take this with as many grains of salt as you need given that the post comes from the primary company responsible for delta.

Comments closed

Scaling ksqlDB, with Animations

Michael Drogalis walks us through scaling models with ksqlDB:

Software engineering memes are in vogue, and nothing is more fashionable than joking about how complicated distributed systems can be. Despite the ribbing, many people adopt them. Why? Distributed systems give us two things their single node counterparts cannot: scale and fault tolerance.

ksqlDB, the event streaming database, is built with a client/server architecture. You can run it with a single server, or you can cluster many servers together. Part 1 and part 2 in this series explained how a single server executes stateless and stateful operations. This post is about how these work when ksqlDB is deployed with many servers, and more importantly how it linearly scales the work it is performing—even in the presence of faults.

If you like, you can follow along by executing the example code yourself. ksqlDB’s quickstart makes it easy to get up and running.

Click through for well-animated examples.

Comments closed

Database Normal Forms

Joe Celko walks us through key and less-key normal forms:

Even before RDBMS, we had network and hierarchical databases. Their first goal was to remove redundancy. We want to store one fact, one way, one place, and one time. Normalization goes a step further. The goal of normalization is to prevent anomalies in the data. An anomaly could be an insertion anomaly, update anomaly, or deletion anomaly. This means that doing one of those basic operations destroys a fact or creates a falsehood.

It’s an interesting read on a sadly-neglected topic.

Comments closed

Write-Host and Modern Powershell

Adam Listek brings us up to date on the utility of Write-Host in Powershell:

Over the years, there has been a lot of debate around the PowerShell Write Host cmdlet on whether it’s needed and when to use it. As in most cases, the answer is maybe and dependent on your needs.

To understand Write-Host and when to use it, let’s quickly explore the history of Write-Host and learn how best to use this useful cmdlet.

Read on to learn more.

Comments closed

Power BI the Right Way: Separating Data Models and Reports

Paul Turley continues a series on doing Power BI the right way:

Back in the day, when we created BI solutions, reports and data models were separate. If you created a cube or Tabular model with Analysis Services, it was developed with Visual Studio and deployed to a server. Reports could be authored and deployed to the report server, separately. Now, with Power BI Desktop, you author your data model and report in the same development space and then deploy the whole kit and kaboodle to the service at once.

Secretly, Power BI actually separates the report from the data model (called a dataset in the service) and gives them both the same name. This is very convenient for self-service projects because it is quick and easy to make changes to the data model, queries and measures if you need to make a report enhancement. This is all well and good for small, one developer projects but what about those larger scale solutions where the data model and reports are developed by different folks, or when multiple reports are connected to the same data model?

At what point does it make sense to separate the data model and reports into separate files?

Read on and let Paul illuminate.

Comments closed

Uncommon SQL Tricks

Shane O’Neill has a bandolier of SQL tricks to show off:

Recently the DBA Team Lead and I were reviewing some SQL code, and we came across some SQL that neither of us had frequently encountered before. This led to a brief watercooler moment where we shared some uncommon SQL that we had seen. Perfect blog post material, I think.

I had previously learned about ODBC date functions from Shane and also learned about CURRENT in this post, so check it out.

Comments closed

Deploying to Azure SQL Database via GitHub Actions

Kevin Chant shows us how to use GitHub Actions to deploy updates to Azure SQL Database:

After my last post I wanted to test deploying to Azure SQL Database using GitHub Actions. To check that it all runs smoothly.

By the end of this post you will have some ideas about how you can deploy Azure SQL Databases using GitHub Actions. Both for a basic test and more complex deployments.

In my last post here I showed you how you can use GitHub Actions to deploy a free monitoring framework called SQLWATCH to on-premises versions of SQL Server. I thought I would test using the same Azure SQL Deploy Action for Azure SQL database deployments.

Click through for the process.

Comments closed