Press "Enter" to skip to content

Curated SQL Posts

Architecting Your First Microservice

Bijoy Choudhury builds a process:

In any microservices migration, extracting services from all their dependencies and point-to-point integrations carries the most risk. If you feel hesitant about decomposing your application, that hesitation is justified. The first service extraction is uniquely challenging because you have to examine years of accumulated technical debt and unresolved organizational decisions at the same time. 

That’s why the objective for the first service extraction should not focus on achieving immediate scalability or to redefine organizational practices but to validate a narrow capability. Instead, it’s about identifying a discrete unit of functionality that can be isolated, deployed independently, and integrated with the existing system without rewriting the entire system or introducing instability.

There’s some good advice in here, as well as one reason why I’m not totally sold on microservices: the isolation of databases. This sounds great until you’re hitting seven different services to retrieve data 100x slower than a simple SQL query would have been because you have complex filtering criteria across these seven services. And then you build an extra layer of caching, introducing even more complexity to solve a problem that never needed to exist.

Leave a Comment

Working with Recent Data in Dataflows Gen2

Penny Zhou sees recent datasets:

How much time do you spend navigating to the same data sources when building dataflows? Data preparation is an iterative process—you often return to the same sources as you refine your dataflows, add new transformations, or create similar workflows. If you find yourself repeatedly connecting to the same tables, files, or databases, the Recent data module in Dataflow Gen2 is designed for you. This feature reduces friction by providing quick access to your most frequently used data items, letting you focus on the transformation logic rather than navigation.

Click through to see how you can access the Recent data menu and what it includes.

Leave a Comment

Performance Tuning Dependent SQL Queries in DirectQuery Mode

Chris Webb tries a change:

As I described here, Power BI can send SQL queries in parallel in DirectQuery mode and you can see from the Timeline column there is some parallelism happening here – the last two SQL queries generated by the DAX query run at the same time – but everything has to wait for that first SQL query to complete. Why? Can this be tuned?

Click through for an example. I was thinking about how challenging it would be to improve this performance at the SQL query level and if you could build a single query that operates over all three sets of data—distinct customers, distinct customers on Mondays, distinct customers in Januaries–while still performing acceptably. I’m not sure that the variants I sketched out in my head would actually perform faster, thanks to the “distinct” requirements.

Leave a Comment

A Primer on Data Storage in PostgreSQL

Grant Fritchey shares some thoughts:

The whole idea behind a database is the ability to persist the data. You want your inventory of widgets to get stored so you can look at it later. That means writing out to disks. However, what is writing to disk and where is it being written? Unlike SQL Server which has one (or more) big file for all data, PostgreSQL has a collection of a large number of files. There is a methodology and structure to these files that you need to understand in order to later understand how the data gets written to and retrieved from these files.

While we’re going to be very focused on file, page, folder, etc., throughout this article, that’s just part of the physical nature of persisting your data. What is being persisted is still the logical information you’re most interested in – rows and columns. I just wanted to emphasize the distinction between the two here.

Click through to see how PostgreSQL stores information.

Leave a Comment

A Primer on dbt against DuckDB

Robin Moffatt shares a tutorial on dbt:

In 2022 I made a couple of attempts to learn dbt, but it never really ‘clicked’.

I’m rather delighted to say that as of today, dbt has definitely ‘clicked’. How do I know? Because not only can I explain what I’ve built, but I’ve even had the 💡 lightbulb-above-the-head moment seeing it in action and how elegant the code used to build pipelines with dbt can be.

In this blog post I’m going to show off what I built with dbt, contrasting it to my previous hand-built method.

I also had heard of dbt but haven’t really spent the time to learn it because I’m not really a data engineer. But this tutorial has me interested in diving in further.

Leave a Comment

Spark Schema Inference in Production

Miles Cole shares some advice:

To show the impact I want to highlight a benchmark that included Fabric Spark on a single 19GB CSV input file (100M Contoso dataset, sales table) for the benchmark. While there were a number of issue with this benchmark that inadvertently make Spark appear to be slow, this is only focused on the impact of inferring schema and practical recommendations.

Read on to see a performance problem that schema inference brings up. I’d also want to mention the risk of data updates blowing up your well-laid plans as a risk. Schema inference is a double-edged sword: it can be convenient and open up new approaches to development, but can just as easily cause unexpected failures.

Leave a Comment

Optimizing Planned Availability Group Failover in SQL Server

Aaron Bertrand shares some advice:

Shaving even a handful of seconds from the process can improve the application and end user experience; it can also drastically reduce alert noise or, at least, how long alerts have to stay muted. There’s a lot of material out there about performing AG failovers correctly (no data loss), but far less that focuses on shortening the disruption window. The difference is usually some combination of redo volume, checkpoint behavior, open transactions, and secondary readiness.

I wanted to share some techniques I use to make planned failovers faster and more predictable. Some of these techniques are well documented, while others come from real-world patterns I’ve observed across many SQL Server environments. I’ll talk about what I do before, during, and after the failover to minimize disruption and increase the chance that end users are oblivious that anything happened.

Aaron provides several tips to help reduce the pain of failover.

Leave a Comment

An Overview of the Fabric Unified Admin Monitoring Tool

Rob Sewell lays out some information:

When you are responsible for a Microsoft Fabric tenant, it will not be very long before you are facing many questions.

Questions like:

  • How is my capacity being used?
  • Which workspaces are consuming the most resources?
  • What are my users actually doing?
  • When are my peak usage times?

You can scabble around in the Admin portal and try to piece together the answers, but it is a bit like trying to navigate a city with a paper map — you can get there eventually, but it is slow and painful, and you will probably miss some things along the way.

Read on to see how FUAM can help answer these sorts of questions.

Leave a Comment

An Introduction to Transactions in SQL Server

Paul Randal hits us with another blast from the past:

One of the most fundamental concepts in any relational database management system (RDBMS), such as SQL Server, is the transaction. During my consulting career, I’ve seen many instances of performance problems caused by developers not understanding how transactions work in SQL Server, so in this tutorial, I’ll explain what transactions are and why they’re necessary, plus some details of how they work in SQL Server. There are nuances to some of this when Accelerated Database Recovery (ADR) is in use – topics for future articles.

One thing that Paul does not point out here but has done in other places is just how bad the implementation is for nested transactions in SQL Server. They’re very limited in nature, so the best advice there would be not to use them at all.

Leave a Comment