Press "Enter" to skip to content

Month: January 2026

BIGINT Serial Columns in PostgreSQL

Elizabeth Christensen lays out an argument:

Lots of us started with a Postgres database that incremented with an id SERIAL PRIMARY KEY. This was the Postgres standard for many years for data columns that auto incremented. The SERIAL is a shorthand for an integer data type that is automatically incremented. However as your data grows in size, SERIALs and INTs can run the risk of an integer overflow as they get closer to 2 Billion uses.

We covered a lot of this in a blog post The Integer at the End of the Universe: Integer Overflow in Postgres a few years ago. Since that was published we’ve helped a number of customers with this problem and I wanted to refresh the ideas and include some troubleshooting steps that can be helpful. I also think that BIGINT is more cost effective than folks realize.

Click through for Elizabeth’s argument. I’d say that this is very similar for SQL Server, where I’m more inclined to create a BIGINT identity column, especially because I almost automatically apply page-level compression to tables so there’s not really a downside to doing this. Identity columns don’t have a domain, so there’s no domain-specific information like you’d get with a column such as Age; and with page-level compression, you’re not wasting space.

Leave a Comment

SQL Server 2025 CU1 Woes

Brent Ozar notes some problems:

SQL Server 2025 Cumulative Update 1 came out last week, and I was kinda confused by the release notes. They described a couple dozen fixed issues, and the list seemed really short for a CU1.

However, the more I dug into it, the weirder things got. For example, there were several new DMVs added – which is normally a pretty big deal, something to be celebrated in the release notes – but they weren’t mentioned in the release notes. One of the DMVs wasn’t even documented. So I didn’t blog to tell you about CU1, dear reader, because something about it seemed fishy.

Read on for a big deal.

Leave a Comment

Running PostgreSQL on Kubernetes

Umair Shahid digs into the arguments for and against:

“Should PostgreSQL run on Kubernetes too?”

The worst answers are the confident ones:

  • “Yes, because everything else is on Kubernetes.”
  • “No, because databases are special.”

Both are lazy. The right answer depends on what you’re optimizing for: delivery velocity, platform consistency, latency predictability, operational risk, compliance constraints, and, most importantly, who is on-call when things go sideways.

Click through for a detailed analysis. It’s a similar story in SQL Server:

Leave a Comment

Measuring Time to Display an Image in Power BI

Chris Webb breaks out the stopwatch:

Carrying on my series on troubleshooting Power BI performance problems with Performance Analyzer, another situation where a report may be slow even when the DAX queries it generates against the underlying semantic model are fast is when you have large images displayed in an Image visual. Let’s see an example.

Click through for that example. And maybe don’t plop in so many 25 MB images.

Leave a Comment

In Support of Ugly Code

John Cook defends (some) ugly code:

Ugly code may be very valuable, depending on why it’s ugly. I’m not saying that it’s good for code to be ugly, but that code that is already ugly may be valuable.

That something is ugly is typically a visceral reaction. But I try to tease out why I think code is ugly, as it can be for several reasons.

  • It’s not formatted well or consistently. That’s an easy fix for the most part.
  • Naming is inconsistent or contradictory. Depending on the tooling, this is a reasonably easy fix.
  • The logic is convoluted to me. This is where things get tricky. Is it convoluted because I don’t understand what’s going on? Or is it convoluted because the person who developed or maintained it didn’t understand something important? If it’s the former, I try (“try” being the operative word here) to bite my tongue and dig in deeper to understand it better. But if it’s the latter, I think that’s fair game for refactoring.

Younger me was all about rewriting and removing nasty, inefficient, ugly code. But older me realizes that only some nasty, inefficient, ugly code is actually bad. I still will heartily argue that code is a liability and that most code bases could make do with a spring cleaning. But it has to come from a place of understanding first. I have a lot more on the topic of technical debt in an essay I wrote a few years ago. And I did purposefully cut myself off at one point to be cute.

Leave a Comment

SSMS 22 Tabs vs Spaces

Koen Verbeeck comes in with a public service announcement:

I’m not trying to start up a debate whether you should use tabs or spaces when indenting code. Personally, I prefer spaces because when I copy the code to another editor the outlining of the code remains the same while with tabs it’s not always the case (looking at you, Word and Outlook). But I don’t want to hit the spacebar 4 times whenever I want to indent something, so I use the setting “insert spaces instead of tabs”:

Click through for a situation in which that might not happen correctly, as well as what you can do about it.

Leave a Comment

DATE in Oracle vs PostgreSQL

Akhil Reddy Banappagari performs a comparison:

Choosing a correct datatype mapping while migrating from Oracle to PostgreSQL is very important to avoid migration failures. Especially when we have date and time involved, it is very important to understand the behavior in both Oracle and PostgreSQL. In this article, we are going to discuss about DATE datatype in Oracle and PostgreSQL, and avoiding constraint violations while migrating from Oracle to PostgreSQL when DATE data type is involved.

I’d consider this a case where Oracle is the weird one.

Leave a Comment

Homomorphic Encryption in SQL Server

Sebastiao Pereira tries to preserve privacy:

Homomorphic encryption is a cryptographic algorithm that lets computations be performed directly on encrypted data without needing to decrypt it. This enables secure outsourcing of computations on sensitive data while preserving privacy.

Is it possible to have homomorphic encryption in SQL Server?

There’s a lot of effort and a CLR module involved, but it is possible. Now, my next question is, how well does it perform in practice? 10 patients is fine for a demonstration, but at what point does this tip over?

Leave a Comment

Writing Sparse Pandas DataFrames to S3

Pooja Chhabra tries a few things:

If you’ve worked with large-scale machine learning pipelines, you must know one of the most frustrating bottlenecks isn’t always found in the complexity of the model or the elegance of the architecture — it’s writing the output efficiently.

Recently, I found myself navigating a complex data engineering hurdle where I needed to write a massive Pandas sparse DataFrame — the high-dimensional output of a CountVectorizer — directly to Amazon S3. By massive, I mean tens of gigabytes of feature data stored in a memory-efficient sparse format that needed to be materialized as a raw CSV file. This legacy requirement existed because our downstream machine learning model was specifically built to ingest only that format, leaving us with a significant I/O challenge that threatened to derail our entire processing timeline.

Read on for two major constraints, a variety of false starts, and what eventually worked.

Leave a Comment