Press "Enter" to skip to content

Author: Kevin Feasel

Running PostgreSQL on Kubernetes

Umair Shahid digs into the arguments for and against:

“Should PostgreSQL run on Kubernetes too?”

The worst answers are the confident ones:

  • “Yes, because everything else is on Kubernetes.”
  • “No, because databases are special.”

Both are lazy. The right answer depends on what you’re optimizing for: delivery velocity, platform consistency, latency predictability, operational risk, compliance constraints, and, most importantly, who is on-call when things go sideways.

Click through for a detailed analysis. It’s a similar story in SQL Server:

Comments closed

Measuring Time to Display an Image in Power BI

Chris Webb breaks out the stopwatch:

Carrying on my series on troubleshooting Power BI performance problems with Performance Analyzer, another situation where a report may be slow even when the DAX queries it generates against the underlying semantic model are fast is when you have large images displayed in an Image visual. Let’s see an example.

Click through for that example. And maybe don’t plop in so many 25 MB images.

Comments closed

In Support of Ugly Code

John Cook defends (some) ugly code:

Ugly code may be very valuable, depending on why it’s ugly. I’m not saying that it’s good for code to be ugly, but that code that is already ugly may be valuable.

That something is ugly is typically a visceral reaction. But I try to tease out why I think code is ugly, as it can be for several reasons.

  • It’s not formatted well or consistently. That’s an easy fix for the most part.
  • Naming is inconsistent or contradictory. Depending on the tooling, this is a reasonably easy fix.
  • The logic is convoluted to me. This is where things get tricky. Is it convoluted because I don’t understand what’s going on? Or is it convoluted because the person who developed or maintained it didn’t understand something important? If it’s the former, I try (“try” being the operative word here) to bite my tongue and dig in deeper to understand it better. But if it’s the latter, I think that’s fair game for refactoring.

Younger me was all about rewriting and removing nasty, inefficient, ugly code. But older me realizes that only some nasty, inefficient, ugly code is actually bad. I still will heartily argue that code is a liability and that most code bases could make do with a spring cleaning. But it has to come from a place of understanding first. I have a lot more on the topic of technical debt in an essay I wrote a few years ago. And I did purposefully cut myself off at one point to be cute.

Comments closed

DATE in Oracle vs PostgreSQL

Akhil Reddy Banappagari performs a comparison:

Choosing a correct datatype mapping while migrating from Oracle to PostgreSQL is very important to avoid migration failures. Especially when we have date and time involved, it is very important to understand the behavior in both Oracle and PostgreSQL. In this article, we are going to discuss about DATE datatype in Oracle and PostgreSQL, and avoiding constraint violations while migrating from Oracle to PostgreSQL when DATE data type is involved.

I’d consider this a case where Oracle is the weird one.

Comments closed

SSMS 22 Tabs vs Spaces

Koen Verbeeck comes in with a public service announcement:

I’m not trying to start up a debate whether you should use tabs or spaces when indenting code. Personally, I prefer spaces because when I copy the code to another editor the outlining of the code remains the same while with tabs it’s not always the case (looking at you, Word and Outlook). But I don’t want to hit the spacebar 4 times whenever I want to indent something, so I use the setting “insert spaces instead of tabs”:

Click through for a situation in which that might not happen correctly, as well as what you can do about it.

Comments closed

Homomorphic Encryption in SQL Server

Sebastiao Pereira tries to preserve privacy:

Homomorphic encryption is a cryptographic algorithm that lets computations be performed directly on encrypted data without needing to decrypt it. This enables secure outsourcing of computations on sensitive data while preserving privacy.

Is it possible to have homomorphic encryption in SQL Server?

There’s a lot of effort and a CLR module involved, but it is possible. Now, my next question is, how well does it perform in practice? 10 patients is fine for a demonstration, but at what point does this tip over?

Comments closed

Writing Sparse Pandas DataFrames to S3

Pooja Chhabra tries a few things:

If you’ve worked with large-scale machine learning pipelines, you must know one of the most frustrating bottlenecks isn’t always found in the complexity of the model or the elegance of the architecture — it’s writing the output efficiently.

Recently, I found myself navigating a complex data engineering hurdle where I needed to write a massive Pandas sparse DataFrame — the high-dimensional output of a CountVectorizer — directly to Amazon S3. By massive, I mean tens of gigabytes of feature data stored in a memory-efficient sparse format that needed to be materialized as a raw CSV file. This legacy requirement existed because our downstream machine learning model was specifically built to ingest only that format, leaving us with a significant I/O challenge that threatened to derail our entire processing timeline.

Read on for two major constraints, a variety of false starts, and what eventually worked.

Comments closed

When Wide Queries Become Slow in SQL Server

Kendra Little talks baggage:

I see this pattern repeatedly: a “wide” query that returns many columns and less than 100k rows runs slowly. SQL Server gets slow when it drags large amounts of baggage through the entire query plan, like a solo traveler struggling with massive suitcases in an airport instead of picking them up close to their destination.

SQL Server often minimizes data access by grabbing all the columns it needs early in query execution, then doing joins and filters. This means presentation columns get picked up early.

Read on to see the effects of this, as well as what you can do to mitigate the issue.

Comments closed

A PostgreSQL Query Plan that Changes without Data or Stats Changes

Frederic Yhuel troubleshoots an issue:

We recently encountered a strange optimizer behaviour, reported by one of our customers:

Customer: “Hi Dalibo, we have a query that is very slow on the first execution after a batch process, and then very fast. We initially suspected a caching effect, but then we noticed that the execution plan was different.”

Dalibo: “That’s a common issue. Autoanalyze didn’t have the opportunity to process the table after the batch job had finished, and before the first execution of the query. You should run the VACUUM ANALYZE command (or at least ANALYZE) immediately after your batch job.”

Customer: “Yes, it actually solves the problem, but… your hypothesis is wrong. We looked at pg_stat_user_tables, and are certain that the tables were not vacuumed or analyzed between the slow and fast executions. We don’t have a production problem, but we would like to understand.”

Dalibo: “That’s very surprising! we would also like to understand…”

So let’s dive in!

Read on for a description of the issue and what Frederic and team found.

Comments closed