Press "Enter" to skip to content

Curated SQL Posts

The Monty Hall Problem

I have a new video:

In this video, I explain the classic Monty Hall problem, based on the concept of the show Let’s Make a Deal. I explain the paradox behind the problem and demonstrate that it’s better to switch doors.

I’m not joking at all when I say it took me years of listening to explanations before it actually clicked. Some of it is my innate stubbornness, but I think this is a great example of a true paradox, where the intuitive answer is wrong and first-level reasoning also leads you astray.

Comments closed

Testing ChatGPT with Bad Advice

Louis Davidson continues a series:

As started in part 1 of this series, I have set out to test an LLMs ability to technical edit. For my first set of tests, I am using a pair of articles I created, filled with very bad advice. The advice is the same for both articles, but what differs is the intro and the conclusion. One says the advice is good, the other said it is bad. It is all very very bad, including a really terrible SELECT statement versus loop construct that will cause an eternal loop that inserts into a temporary table.

My goal is to see how much of that advice will be noted as bad, and if it says anything nice at all about the text, etc. If you want to see the entire documents, you can get them here in a zip file, both in text and word document formats.

Starting with an extreme example like this is fine, I believe. Given the results, they were fine, though it sounds like Louis won’t be out of a job anytime soon.

Comments closed

Finding Object Dependencies in SQL Server

Vlad Drumea is looking for matches:

Figuring out object dependencies in a SQL Server database, especially one you’re not familiar with, can be a challenge.
Luckily, Microsoft provides the sys.sql_expression_dependencies system catalog view to help DBAs and developers track down various object dependencies.

I leveraged this catalog view in a query that returns the necessary info in a more user-friendly way, while also allowing easy filtering for referenced or referencing object.

Click through for the script. It’s been a while since I’ve used this DMV in anger, though my recollection is that I had problems finding dependencies over linked servers.

Comments closed

The WINDOW Clause in SQL Server 2022

Andy Brownsword takes a look at one quality of life improvement in SQL Server 2022:

Window functions allow us to perform a function across a set of rows in a result set, rather than how we might typically group them. In SQL Server 2022 we have a new clause available for our queries which can help tidy up how these are defined.

You can do some neat things with the WINDOW clause, including partial matches. For example, you could define a window r1 AS (PARTITION BY x.MyColumn) and then another window r2 AS (r1 ORDER BY x.SomeOtherColumn). I don’t think there are a huge number of scenarios in which this is helpful, especially because performance typically depends upon minimizing the unique number of window functions in your query, but every once in a while it’s a really neat thing.

Comments closed

Microsoft Fabric SKU Estimator

Jonathan Garriss has an announcement:

We’re excited to unveil the Microsoft Fabric SKU estimator, now available in preview—an enhanced version of the previously introduced Microsoft Fabric Capacity Calculator. This advanced tool has been refined based on extensive user feedback to provide tailored capacity estimations for businesses.

Designed to optimize data infrastructure planning, the Microsoft Fabric SKU Estimator helps customers and partners accurately assess capacity requirements and select the most suitable SKU for their workloads, protecting users from under-provisioning and overcommitment.

And, in classic Microsoft Fabric fashion, it’s in preview.

Coming up with some fairly low estimates for a lot of things, it bounced me between an F32 and an F64, which seemed about right.

Comments closed

SID Mismatches on SQL Server Logins

Chad Callihan fixes a mismatch:

Just as you can’t judge a book by its cover, you can’t judge a SQL Server login by its name.

With availability groups, you may notice a login on a primary replica and a “matching” login on the secondary replica. But just because the login is named the same on each replica doesn’t mean they’re the same to SQL Server.

Click through for the explanation and how you can fix it. The other way to fix it is to use Active Directory (or Entra ID) logins, though that’s not always in the cards.

Comments closed

choroplethr 4.0.0 Now in CRAN

Ari Lamstein has an announcement:

With this version, I have transferred the maintenance of choroplethr to Zhaochen He, an economics professor at Christopher Newport University. Zhao addressed the issues that led to choroplethr being archived from CRAN in February. Please join me in thanking Zhao for his contribution!

Click through for the updates, as well as what Ari views as the current challenges for the project as he hands the project over Zhaochen He. H/T R-Bloggers.

Comments closed

Vector Indexing in PostgreSQL

Hans-Jürgen Schönig builds some indexes on vector data:

In the previous post we have imported a fairly large data set containing Wikipedia data, which we downloaded using pgai. However, importing all this data is not enough because we also need to keep an eye on efficiency. Therefore, it is important to understand that indexing is the key to success.

Click through for an overview of what options are available for indexing vectors in PostgreSQL, as well as the trade-offs and disk space requirements.

Comments closed

Comparing Varieties of Statistics in SQL Server

Kendra Little gets the smorgasbord:

Statistics in SQL Server are simple in theory: they help the optimizer estimate how many rows a query might return.

In practice? Things get weird fast. Especially when you start filtering on multiple columns, or wondering why the optimizer thinks millions of rows are coming back when you know it’s more like a few hundred thousand.

In this post, I’ll walk through examples using single-column, multi-column, and filtered statistics—and show where estimates go off the rails, when they get back on track, and why that doesn’t always mean you need to update everything with FULLSCAN.

Read on for a review of the three types of statistics. Admittedly, I’ve never had much luck with filtered statistics improving the performance of queries. If I were to speculate, I’d say that they’re good for a very specific type of problem that maybe I just don’t run into that often.

Comments closed

Purview DLP Updates

Yael Biss has an announcement:

Microsoft Purview’s Data Loss Prevention (DLP) policies for Fabric now supports Fabric KQL and Mirrored DBs!

Purview DLP policies help organizations to improve their data security posture and comply with governmental and industry regulations. Security teams use DLP policies to automatically detect upload of sensitive information to Microsoft 365 applications like SharePoint and Exchange, and to Fabric’s semantic models and lakehouses.

And another one:

In today’s fast-paced data-driven world, enterprises are building more sophisticated data platforms to gain insights and drive innovation. Microsoft Fabric Lakehouses combine the scale of a data lake with the management finesse of a data warehouse – delivering unified analytics in an ever-evolving business landscape. But with great data comes great responsibility. Protecting sensitive information and ensuring regulatory compliance is paramount. That’s where Data Loss Prevention (DLP) policies with restricted access come into play.

Click through to see what this preview currently offers.

Comments closed