Press "Enter" to skip to content

Curated SQL Posts

Debugging in Databricks

Chen Hirsh enables a debugger:

Do you know that feeling, when you write beautiful code and everything just works perfectly on the first try?

I don’t.

Every time I write code It doesn’t work in the beginning, and I have to debug it, make changes, test it…

Databricks introduced a debugger you can use on a code cell, and I’ve wanted to try it for quite some time now. Well, I guess the time is now 

I’m having trouble in finding the utility for a debugger here. Notebooks are already set up for debugging: you can easily add or remove cells and the underlying session maintains state between cells.

Comments closed

Querying a Fabric KQL Database via REST API

Sandeep Pawar grabs some data:

I have previously explained how to query a KQL database in a notebook using the Kusto Spark connector, Kusto Python SDK, and KQLMagic. Now, let’s explore another method using the REST API. Although this is covered in the ADX documentation, it isn’t in Fabric (with example), so I wanted to write a quick blog to show how you can query a table from an Eventhouse using a REST API.

Click through to see how you can do it. Sandeep’s code is in Python but because this is just hitting a REST API rather than using a library, you could also use some tool like Postman.

Comments closed

Aggregate Functions in Power BI

Hristo Hristov writes some DAX:

At times when using Power BI, you want to combine your data to produce an aggregated value. The aggregation is performed over some criteria – frequently this may be time (year, month, date) or a categorical value. Some popular aggregation functions to apply can be Sum, Average, Maximum, Minimum, or Count. Typically, Power BI applies certain aggregations by default when adding data fields to visualizations. What if you wanted to create your own data aggregations? To achieve better understanding of the underlying data, how can you attain fine-grained control over the aggregations?

Read on for several DAX measures, including totals, running totals, moving averages, and day over day changes.

Comments closed

Open Order Computation with Visual Calculations in DAX

Marco Russo and Alberto Ferrari want to track open orders:

Open orders (or events in progress) is an extremely common pattern in business intelligence. It answers a simple question: given two dates – order received and order delivered – how many orders have yet to be delivered at any given point? We do have a pattern here: Events in progress – DAX Patterns that solves the scenario with DAX measures.

While demonstrating the pattern during a classroom course, one student (thanks Justin Duff!) asked whether the pattern could be solved by using visual calculations. It turns out that visual calculations can be of great help in optimizing the performance of this specific scenario because they greatly reduce the number of calculations required to solve it. Well… sort of. Visual calculations might perform well, but we will do better with DAX alone!

Read on for the examples and why visual calculations might not be the best fit for this scenario.

Comments closed

Creating Totals and Subtotals in Postgres with ROLLUP and CUBE

Elizabeth Christensen uses a pair of analytical operators:

Postgres is being used more and more for analytical workloads. There’s a few hidden gems I recently ran across that are really handy for doing SQL for data analysis, ROLLUP and CUBE. Rollup and cube don’t get a lot of attention, but follow along with me in this post to see how they can save you a few steps and enhance your date binning and summary reporting.

I’ve used ROLLUP on occasion, but never found a great case when CUBE made sense in a report. I am, however, quite partial to GROUPING SETS, the third of these analytical operators and the one that gives you the most control.

Comments closed

Deadlock Resolution and Prevention in SQL Server

Eitan Blumin works through some deadlock issues:

Deadlocks in SQL Server can be frustrating and can cause significant performance and reliability issues. A deadlock occurs when two or more transactions are waiting for each other to release a lock on a resource, resulting in a situation where no transaction can proceed, and eventually, one of them is automatically killed and rolled back. This can happen when two transactions try to access the same data in a different order or when one transaction holds a lock on a resource while waiting for a lock held by another transaction. In this blog post, we’ll discuss how to troubleshoot and prevent deadlocks in SQL Server.

Click through for a way to get information on deadlocks, as well as three techniques for reducing the risk of deadlocks occurring.

Comments closed

Using Kubernetes with Distributed Availability Groups

Andrew Pruski has a guide for us:

A while back I wrote about how to use a Cross Platform (or Clusterless) Availability Group to seed a database from a Windows SQL instance into a pod in Kubernetes.

I was talking with a colleague last week and they asked, “What if the existing Windows instance is already in an Availability Group?”

This is a fair question, as it’s fairly rare (in my experience) to run a standalone SQL instance in production…most instances are in some form of HA setup, be it a Failover Cluster Instance or an Availability Group.

Read on for the tutorial. There are quite a few steps involved.

Comments closed

Updates on the Spark Connect Dotnet Library

Ed Elliott has an update for us:

There have been quite a few changes in the last couple of months and I just wanted to give a quick update on the current state of the project. In terms of usage I am starting to hear from people using the library and submitting pr’s and requests so although usage is pretty low (which is expected from the fact that the Microsoft supported version usage wasn’t very high) it is growing which is interesting.

Read on for thoughts on production readiness, support for Spark 4.0, a couple of other updates, and some future plans.

Comments closed

Creating a Numbers Table in Power Query

Koen Verbeeck builds a tally table:

Sometimes you need a numbers table (also called tally table) when you’re working with data. It’s a table with one single integer column that contains incremental numbers (0,1,2,3,4,…). In a database, it’s fairly easy to generate using either cross-joins or a generator function (GENERATE_SERIES in T-SQL). Recently, I needed such a table in Power BI Desktop. Suppose you don’t have a database as a source (or maybe you cannot change the database), so we need to generate this in Power Query.

Read on to see how.

Comments closed

Comparing Numeric Data Types between MySQL and PostgreSQL

Aisha Bukar compares data types:

Numeric data types serve as a foundation in DBMS. It helps represent numbers in different forms which helps in storing and manipulating arithmetic operations on numerical data.

When dealing with numeric data types in databases like MySQL and PostgreSQL, it’s important to understand key concepts like range, precision(fixed-point numbers, floating-point numbers), and storage. This helps you understand how data is stored, processed, and retrieved.

Read on for quick descriptions of the three primary properties of numeric data types (range, precision, and storage), as well as how the two relational database platforms handle numeric data.

Comments closed