Press "Enter" to skip to content

Curated SQL Posts

Scaling with PostgreSQL

Shayon Mukherjee shares some tips about scaling in PostgreSQL:

“Postgres was great when we started but now that our service is being used heavily we are running into a lot of ‘weird’ issues”

This sentiment is frequently echoed by CTOs and senior engineers at high-growth startups when I speak with them.

Scaling PostgreSQL successfully doesn’t always require a full team of DBAs and experts. The beauty of PostgreSQL is that solutions often lie within the database itself – by rethinking your data access patterns from first principles, you can solve many business problems at scale.

In this post, I’ll address some common “weird” issues I’ve encountered and explore solutions that work at scale.

I like some of them, though I’m not a fan of eliminating or delaying foreign key constraints, as those are important for data quality. I’m not knowledgeable enough in PostgreSQL administration to have a strong opinion on these, however.

Leave a Comment

Inflation in Medieval China

Richard Vale digs into a dataset:

In this post, I would like to draw attention to a very interesting data set collected by Guan, Palma and Wu as part of the replication package for their paper The rise and fall of paper money in Yuan China, 1260-1368. The paper describes inflation, money and prices during the Yuan Dynasty era in China.

First, a little historical background.

Read on for the analysis. H/T R-Bloggers.

Leave a Comment

Experimenting with BIT_COUNT

Louis Davidson has an idea:

I was editing an article the other day that uses the BIT_COUNT function that was added to SQL Server 2022. The solution presented is excellent, and I will try to come back and link to it here when I remember. (It will be linked the other way.

Reading that did two things for me. First it cave me an idea of how the BIT_COUNT function might be actually be used in a useful way. The solution that was presented would only work in SQL Server 2022 (It will work in earlier compatibility levels, based on the tests I have done.)

Read on for what Louis tried out.

Leave a Comment

Using the Microsoft Fabric Capacity Metrics App

Reitse Eskens uses a tool:

In a number of previous blogs and in my session on loadtesting Microsoft Fabric, I’ve always questioned the metrics app and one specific point is the timepoint detail. When you click on a graph, you get the option to go to the timepoint detail and read more.

This is all fun and games but looking at the list of active processes at that specific point in time, you’ll quickly see processes that are way out of the selected point in time. For me, it rendered this thing useless because it messed up the things I wanted to see.

Read on to see the right way to handle this app.

Leave a Comment

Inlining Views in PostgreSQL

Radim Marek shows off how the PostgreSQL database engine can inline a view:

Database VIEWs are powerful tools that often don’t get the attention they deserve when building database-driven applications. They make our database work easier in several ways:

  • They let us reuse common query patterns instead of writing them over and over
  • They give us a place to define business rules once and use them everywhere
  • They help us write cleaner, more organized queries

Let’s see how this works with a practical example.

I’m not sure how well PostgreSQL manages nested views, as the biggest problem in SQL Server with inlining views comes when you have overly complicated views made up of views joined to views made up of views joined to–well, you get the idea. That said, Radim does point out a variety of scenarios in which inlining doesn’t happen in PostgreSQL.

Leave a Comment

Farewell, Azure Data Studio

Deb Melkin says goodbye:

Interesting news dropped – Microsoft announced that Azure Data Studio is going to be retired in 2026. You can read the official announcement here. In some ways, I have to admit that I was a little surprised to see this.

I’ve been a very regular user of Azure Data Studio over the past several years, especially because it was the option for Linux. I believe the most likely reason they’re deprecating the product is because there’s a lot of effort in keeping up with Visual Studio Code’s changes and forking that code base every month for Azure Data Studio.

My concern is that the mssql extension for Visual Studio Code is nowhere near ready for prime time, and I wonder if it will be sufficiently capable by the time Azure Data Studio is gone for good.

1 Comment

Handling Errors in Apache Flink Apps

Alexis Tekin and Jeremy Ber handle an error:

Data streaming applications continuously process incoming data, much like a never-ending query against a database. Unlike traditional database queries where you request data one time and receive a single response, streaming data applications constantly receive new data in real time. This introduces some complexity, particularly around error handling. This post discusses the strategies for handling errors in Apache Flink applications. However, the general principles discussed here apply to stream processing applications at large.

Read on to see how this all works when you’re hosting a Flink application. This directly relates to Flink applications that live in AWS, though very little in the article is AWS-specific.

Leave a Comment

Understanding Error Severity Levels 19 through 25 in SQL Server

Steve Stedman handles an error:

SQL Server uses error severity levels to indicate the nature and seriousness of an error. These severity levels range from 0 to 25, with each level indicating a different type of issue, from informational messages to catastrophic failures. In this post, we’ll focus on severity levels 19 through 25, which are typically the most critical errors that can occur in SQL Server.

Fortunately, given how devastating they are, severity level 21 and higher are extremely rare in my experience.

Leave a Comment

The Power of an Error Message

Joey D’Antoni shares some thoughts on error messages:

I’ve been working in technology for entirely too long—but one of the first, and most important lessons I learned was try to identify the correct error message, in the right log file, and then it can be generally trivial to resolve that error message. The challenge to forever and ever always has been “finding the right error message”, as what typically happens is that once there is one failure in a computer program, there is a series of cascading errors, which makes it harder to identify the right error message that you need to copy and paste into google/stack overflow/chatgpt/copilot.

Read on for more. I think it’s easy for us to rag on bad error messages, though even opaque error codes can still be reasonable if there’s a comprehensive guide. This was a classic problem with blue screens: oh, I have an error code 0x2378f83f. So what does 0x2378f83f mean? Dunno, nobody else on the internet has apparently ever had that code. And Joey also brings up my favorite error message to rag on: “String or binary data would be truncated.” Sadly, although they did finally add column name and message text to the error, they never fixed the most galling mistake in that error message: the passive voice.

Leave a Comment