Press "Enter" to skip to content

Curated SQL Posts

Goodbye, Azure ML SDK v1

I have a new video:

In this video, I cover some news from Microsoft around the deprecation of the Azure Machine Learning SDK v1. We’ll take a look at the upgrade guide and see what it will take to perform this upgrade.

Microsoft will still support the SDK v1 until September of 2026, so we have a year to get code sorted out. The CLI v1, however, will go away sooner, so be sure you’re keeping up on that.

Leave a Comment

End of Month Testing in Powershell

Andy Levy checks if this is the end of the month:

This is one of those blog posts you write so that 2 years later, you can look it up to remind yourself how to do something.

I found myself needing to figure out if “today” was the end of the month in PowerShell. In T-SQL, this is easy, as we have the EOMONTH() function. But PowerShell (the .NET System.DateTime struct) doesn’t have the same thing.

Read on for the solution Andy came up with.

Leave a Comment

Reading Delta Tables via SQL Code in a Microsoft Fabric Python Notebook

Gilbert Quevauvilliers writes a SQL statement:

I come from a TSQL background, so using SQL makes it easy for me to work with data.

There are multiple ways to use SQL in a PySpark notebook, and when I started using a Python notebook it was not so straightforward.

In this blog post I will show you how I use SQL Code.

As mentioned previously I am by no means an expert, I typically find a way that works, is fast and doesn’t consume a lot of capacity. If that works consistently for me then that is how I go about it.

Click through for the solution, which uses DuckDB. As such, the SQL syntax isn’t T-SQL—it’s more like psql. But it does do a great job of interacting with Parquet files and Delta tables.

Leave a Comment

Queue-Style Batch Deletion Woes in PostgreSQL

Shayon Mukherjee runs into an interesting issue:

I recently discovered an unexpected behavior in PostgreSQL involving a pattern of using a Common Table Expression (CTE) with DELETE ... RETURNING and LIMIT to process a batch of items from a queue-like table. What seemed straightforward turned out to have a surprising interaction with the query planner.

Click through to see what happened, as well as Shayon’s recommendation on how to fix it.

Leave a Comment

Tips for Using Database Snapshots

Stephen Planck offers some advice:

Database snapshots are one of those features that’s been around forever, but still solves real-world problems with very little setup. In a single statement you can capture a point-in-time, read-only copy of any user database, use it for reporting or off-load testing, and—if disaster strikes—revert the source back to that snapshot in minutes. This guide explains how snapshots work under the hood, walks through day-to-day tasks (including creating the original database), and highlights the pitfalls you should plan for before using them in production.

One additional tip that I’d offer: limit yourself to one database snapshot per database. There’s a performance cliff you can hit when you get fancy and try to have multiple database snapshots going at the same time off of the same database.

Leave a Comment

Creating Error Bars in ggplot2

Zhenguo Zhang draws a chart:

Sometimes you may want to create a plot with the following features:

  • a point to indicate the mean of a group
  • error bars to indicate the standard deviation of the group
  • and each group may have subgroups, which are represented by different colors.

In this post, I will show you how to create such a plot using the ggplot2 package in R.

Read on for the demonstration, as well as fixing a common problem of overlapping data points. H/T R-Bloggers.

Leave a Comment

Kafka Data Exploration with Tableflow

Robin Moffatt does some exploratory data analysis:

One of the challenges that I’d always had when it came to building streaming data pipelines is that once data is in a Kafka topic, it becomes trickier to query. Whether limited by the available tools to do this or the speed of access, querying Kafka is just not a smooth experience.

This blog post will show you a really nice way of exploring and validating data in Apache Kafka®. We’ll use Tableflow to expose the Kafka topics as Apache Iceberg™️ tables and then query them using standard SQL tools.

Click through for the demonstration using a real dataset.

Leave a Comment

SQL Server Performance Office Hours

Erik Darling is answering questions again:

My company (using SQL Server 2019 Standard) has an algorithm that keys addresses into a varchar(40) has a cross-reference database that assigns an identity property to each new value, allowing us to post the numeric in our datasets. Production has to search their generated string keys in this database’s table to get the integer key in return. We have ensured proper string data typing on lookups and have unique compressed indexes on the string values. What would your next tuning step be if this was not getting the performance you needed?

There’s a good set of questions this time, so click through for Erik’s answers.

Leave a Comment

From Power BI Premium Capacity to Fabric Capacity

Jon Vöge performs a migration:

So your old Power BI Premium Capacity has run/is running out, and your organization is acquiring a new Fabric Capacity to replace it.

Perhaps the organization even decided to take the chance to move the capacity region to something a little closer to home?

If you find yourself in this situation, how do you best migrate your contents of one Capacity to another?

Read on as Jon explains the migration process within a region (which is very easy) and the migration process if you need to go cross-region (which is rather cumbersome).

Leave a Comment