Press "Enter" to skip to content

Curated SQL Posts

Mirroring an Azure SQL Database in Microsoft Fabric

Gilbert Quevuavilliers holds up a mirror:

Creating a Mirrored Azure SQL Database in Fabric

This week they announced Announcing the Public Preview of Database Mirroring in Microsoft Fabric | Microsoft Power BI Blog | Microsoft Power BI

I decided to see how easy it was to create a mirrored database in Fabric and below are my findings (PS it is AMAZING)

Click through for the demo. Though it does look like Gilbert has mirrored the contents of the blog post a few times as well, at least as of the time of my post here.

Comments closed

SSMS 20 and Mandatory Connection Security

Chad Callihan hits an annoyance:

I tried to run a new query for a CMS but the query window opened as disconnected. If I selected one server out of the group and tried to open a new query, I received an error that “A connection was successfully established with the server, but then an error occurred during the login process.”

That can get really annoying if you have a few hundred instances in your Central Management Server. They’d all go away if you set up certificates for the servers, but until then, it would be a major annoyance.

Comments closed

Parallel Vector Index Rebuild in Postgres

Semab Tariq takes a look at parallel index building in pgvector:

Parallel Index Build refers to the capability to build indexes using parallel processing. In simpler terms, it means that multiple workers or threads can be utilized simultaneously to create an index, which can significantly speed up the index creation process.

When performing an index build operation, PostgreSQL can divide the work among multiple parallel workers, each responsible for building a portion of the index.

Read on to learn more about this bit of functionality in pgvector 0.6 and the performance gains you can get from it.

Comments closed

Processing GitHub Data with Kafka Streams

Lucia Cerchie hits the GItHub API:

GitHub’s data sources (REST + GraphQL APIs) are not only developer-friendly, but a goldmine of interesting statistics on the health of developer communities. Companies like OpenSaucedlinearb, and TideLift can measure the impact of developers and their projects using statistics gleaned from GitHub’s APIs. The results of GitHub analysis can change both day-to-day and over time. 

Apache Kafka is a large and active open source project with nearly a million lines of code. It also happens to be an event streaming platform. So why not use Apache Kafka to, well, monitor itself? And learn a bit about Kafka Streams along the way?  

Click through for the full article, including a demonstration.

Comments closed

Working with INTERSECT and EXCEPT

Erik Darling wounds me:

I have never once seen anyone use these. The most glaring issue with them is that unlike a lot of other directives in SQL, these ones just don’t do a good job of telling you what they do, and their behavior is sort of weird.

Unlike EXISTS and NOT EXISTS, which state their case very plainly, as do UNION and UNION ALL, figuring these out is not the most straightforward thing. Especially since INTERSECT has operator precedence rules that many other directives do not.

I’ve used EXCEPT to check if two datasets are equivalent for testing purposes: A EXCEPT B should be zero rows, and B EXCEPT A should be zero rows. It has built-in handling of any NULL madness. Set intersections have their uses as well.

Comments closed

Working with TRY-CATCH in SQL Server

Steve Jones gives it the ol’ college try:

This is a common error handling technique in other languages. C# uses it, as does Java, while Python has TRY EXCEPT. There are other examples, but these are good habits to get into when you don’t know how code will behave or if there is something in your data or environment that could cause an issue.

In SQL, I think many of us get used to writing one statement in a query and forget to do error handling, or transactions. However, this can be a good habit as your code might grow and people might add more statements that should execute.

Read on for a few examples of how to use SQL Server’s TRY-CATCH functionality. It’s not perfect, but as Steve shows, there are definitely good uses for it.

Comments closed

Comparing pg_basebackup Compression Settings

Kaarel Moppel puts on the lab coat and safety glasses:

In my last post I did a quick check on the performance of the newer (lz4, zstd) pg_dump compression options, which included setting up a small framework to download some openly available “real life”-ish sample datasets. And the general result was that, indeed – the new algos in lower levels provide the best value, especially zstd.

But pg_dump is about compressing essentially text based data…but how about binary Postgres data? Thus the tool to test here additionally is pg_basebackup, with its newer (v15+) compression options. So let’s see if something stands out consistently again.

Click through for the test results.

Comments closed

DBCC CLONEDATABASE and Production Deployments

Madhumita Tripathy takes a step back:

DBCC CLONEDATABASE command generates a schema-only copy of a database. Effective March 1, 2025, Microsoft will discontinue support for the use of the DBCC CLONEDATABASE command as a tool to copy database to production environment. The command is intended for diagnostic and troubleshooting purposes only. Any use of the command for purposes other than those specified will not be supported by Microsoft from the aforementioned date. All editions of Microsoft SQL Server 2016 and later versions are affected by this change.

Now I’m a bit curious about the why behind this post.

Comments closed

Using dtplyr to Process Large Datasets

Dario Radecic takes us through an interesting library:

In a world where compute time is billed by the second, make every one of them count. There are zero valid reasons to utilize a quarter of your CPU and memory, but achieving complete resource utilization isn’t always a straightforward task. That is if you don’t know about R dtplyr.

One option is to use dplyr. It’s simple to use and has intuitive syntax. But it’s slow. The other option is to use data.table. It’s lightning-fast but has a steep learning curve and syntax that’s not too friendly to follow. The third – and your best option – is to combine the simplicity of dplyr with efficiency of data.table. And that’s where R dtplyr chimes in!

Today you’ll learn just how easy it is to switch from dplyr to dtplyr, and you’ll see hands-on the performance differences between the two. Let’s dig in!

I love the performance of the data.table library but strongly prefer the Tidyverse for the sake of convenience. I like that this bridges the gap, at least for dplyr style processing. H/T R-Bloggers.

Comments closed

Going with the Flow: GitHub Edition

I have a new video:

In this video, cover the GitHub flow. We talk a bit about branching strategies and how GitHub development ought to look in a multi-developer situation.

The GitHub flow is a minor variant from the classical Git flow, but one that works well with the ethos of development specifically in GitHub. It’s often overkill for a single-developer repo, but once you have a team working on a problem, this is a much more efficient approach.

Comments closed