Press "Enter" to skip to content

Author: Kevin Feasel

Comparing Sets of Data in T-SQL

Louis Davidson figured out which of these was not like the others, which of these just didn’t belong:

There are many occasions when we want to see the differences between two sets of data. Sometimes a whole table, a subset of a table, or even the results from a couple of queries, like in a unit test.

Maybe you want to see that two sets are exactly the same, for example domain table in DEV, PROD, or maybe even from source control. You might you have a orders table and an orders_history table and you want to see the overlap/changes over a given period of time, like for example, to clean out any useless history.

No matter what the reason, there is a query pattern that will work for you. In this blog I will demonstrate several of these techniques and why you might want to use them in different places.

Click through for those techniques. I am particularly fond of INTERSECT/EXCEPT because of how it handles missing data and typically performs quite well.

Leave a Comment

Exploiting Database as a Service SQL Server

Fabiano Amorim has a post-mortem of a nasty privilege escalation bug in SQL Server:

This article explores a serious flaw in this security model – a SQL Injection vulnerability in sys.sp_help_spatial_geography_histogram that allowed a standard user on managed SQL Server instances (AWS, GCP, Alibaba, Azure) to completely bypass these restrictions, gain access to privileged user data, and decrypt the source code of internal management stored procedures.

It was fixed in SQL Server 2022 CU20 (KB5063814), but this article explains how the exploit worked and the process of eliminating it.

Click through for the details and make sure your on-premises SQL Server instances are patched.

Leave a Comment

Windows Clustering for the DBA

Sandra Delany is perilously close to running afoul of Betteridge’s Law of Headlines:

Should a SQL Server DBA know how a Windows cluster works, and or how to create a Windows cluster, or troubleshoot a cluster? Or should we, as DBAs, stay in our lane? In some organizations, a line is drawn between what a DBA can and can’t do and the System Administrator (SA) has the Infrastructure roles and responsibilities. This is fine, but that doesn’t mean a DBA shouldn’t know how to troubleshoot a FCI (Failover Cluster Instance) or AG (Availability Group) issue after an unplanned cluster-level failover.  (Yes – you can create an AG without a cluster, but that will not be taken into consideration here.)

Read on for some quick tips to get ramped up on what is available in a Failover Cluster Instance.

Leave a Comment

Azure Tenants and Microsoft Fabric

Andy Cutler begins a new series on Microsoft Fabric architecture:

Our Fabric Architecture journey starts with Azure Tenants (the kick-off blog in this series is here with a few jumping-off links to get started with thinking about Fabric Architecture). If you’re ready to spent time sketching out your Fabric Capacity planning, workspace strategy, domain topology, lakehouse/warehouse creation, data loading processes…you might want to stop for a minute and think about tenants. The question I’d like you to consider is What do I need to know when working with a single or a multi-tenancy approach? Let’s unpack this question because while it might sound like a simple list, it actually shapes your governance, scalability, and Fabric operational model. If you’re a seasoned Azure Architect veteran then you already know how to decide between single and multi-tenant cloud rollouts (also, please comment if you have anything to add please), if you work with Fabric/Data and don’t really dive into Azure architecture on a daily basis then please stick around. Hopefully this blog gets you thinking about single/multi-tenant architectures and the benefits/costs.

Read on for a dive into what tenants are, the benefits of single- versus multi-tenancy, and how it all ties into Fabric.

Leave a Comment

Function Volatility and PostgreSQL Partition Performance

Deepak Mahto covers how function volatility can affect how queries on partitioned data perform:

In one of our earlier blogs, we explored how improper volatility settings in PL/pgSQL functions — namely using IMMUTABLESTABLE, or VOLATILE — can lead to unexpected behavior and performance issues during migrations.

Today, let’s revisit that topic from a slightly different lens. This time, we’re not talking about your user-defined functions, but the ones PostgreSQL itself provides — and how their volatility can quietly shape your query performance, especially when you’re using partitioned tables.

Click through for one example using date-based partitioning and date functions.

Leave a Comment

Working with JSON Indexes in SQL Server 2025

Koen Verbeeck tries out a new index type:

We’re trying the new JSON data type in SQL Server for data stored as JSON in a table. When we query it using functions such as JSON_VALUE, we see a full table scan is performed for each query. Is there a way we can index the JSON to improve performance?

The JSON index has a somewhat different definition of its structure and there are some limitations to how it works, but for specific JSON-related queries, you can see the improvement.

3 Comments

Calling Logic Apps via Data Factory Pipelines

Andy Brownsword flips the script:

Last week we looked at calling a Data Factory Pipeline from a Logic App. This week I thought we’d balance it out by taking a look at calling a Logic App from an Azure Data Factory (ADF) Pipeline.

When building the Logic App last week we had to create our own polling mechanism to check for completion of the pipeline. The process is much simpler in the opposite direction. I specifically want to highlight two approaches, and save some pennies whilst we’re at it.

I am all about saving pennies, so be sure to check out that section as well.

Leave a Comment

Creating Your Own ggplot2 Geom

Isabella Velasquez is feeling creative:

If you use ggplot2, you are probably used to creating plots with geom_line() and geom_point(). You may also have ventured into to the broader ggplot2 ecosystem to use geoms like geom_density_ridges() from ggridges or geom_signif() from ggsignif. But have you ever wondered how these extensions were created? Where did the authors figure out how to create a new geom? And, if the plot of your dreams doesn’t exist, how would you make your own?

Enter the exciting world of creating your own ggplot2 extensions.

The post looks a lot like a series of slides, and it takes you through the process of creating a new geom. H/T R-Bloggers.

Leave a Comment

Consider What You Count

Hans-Jürgen Schönig ran out of fingers:

The purpose of this post is not to share some fancy SQL techniques, some super cool magic AI tool that does whatever new hot thing or some crazy coding technique one can use in PostgreSQL. This is all about two simple things that are often forgotten or just neglected. We are of course talking about “awareness” and “clarity”.

Yes, it is that simple. If you don’t know what you have or if you don’t know what you want, all data is worthless (in a best case scenario) or even dangerous. 

The concept for discussion here is a simple count: how many customers do you have? But even fairly simple questions like this can cause difficulty in answering because of business complexities that we need to model in our databases. The problem is that the expedient answer may not be the correct one. This is also a key reason why we end up with debates in meetings about which value of revenue to use.

Leave a Comment