Press "Enter" to skip to content

Category: Internals

What Happens upon Row Modification in Postgres

Semab Tariq takes us on a journey:

But behind the scenes, it’s more complex to ensure that data remains consistent and accessible. In today’s blog, I’ll answer some frequently asked questions from our customers and dive into why PostgreSQL relies on a process called VACUUM to efficiently manage data updates.

Updating a row in PostgreSQL isn’t as straightforward as directly changing the existing data. Instead, PostgreSQL avoids in-place updates, meaning it doesn’t overwrite rows directly. 

But what does this actually mean? 

Read on to learn what that actually means.

Leave a Comment

Column Eviction in Power BI and Direct Lake

Paul Turley talks about fashion:

One of the core best practice guidance principals for Power BI modeling is to avoid including columns that aren’t absolutely necessary for analytic reporting. Every column uses precious memory and especially long, unique values that don’t compress very well. When consulting clients bring me large models that require expensive capacity licensing and pose report performance issues, my first inclination is to see what column data can be carved out of the model; and perhaps moved to another table for a drill-through report.

The product team came up with a very clever way to reduce the in-memory footprint of a Direct Lake semantic model: hold a popularity contest! The semantic model engine will only keep columns in memory based on their hotness. I mean this literally…

Read on to learn a bit more about the algorithm in play and how it differs from a naive Least Recently Used cache.

Leave a Comment

Value Filter Behavior in Power BI

Jeffrey Wang digs into a new feature:

The October 2024 Power BI update introduces an inconspicuous yet significant preview feature: Value Filter Behavior. This feature is activated by setting a new model-level property, ValueFilterBehavior, to Independent. The default setting of Automatic preserves the existing behavior, at least during the public preview period. This property controls how the DAX SUMMARIZECOLUMNS function behaves, which is central to most DAX queries generated by Power BI visuals.

Don’t just take my world for it — create any Power BI visual by adding columns, filters, and measures. If you are familiar with the Performance Analyzer or other tools that capture the DAX query issued by the visual, you will see something like this:

Read on for Jeffrey’s example and a dive into what’s going on.

Comments closed

Preparing a Fetch Operation in a Kafka Consumer

Danica Fine continues a series on Kafka internals:

Welcome back to the third installment of our blog series where we’re diving into the beautiful black box that is Apache Kafka® to better understand how we interact with the cluster through producer and consumer clients.

Earlier in the series, we took a look at the Kafka producer to see how the client works before following a produce request as it’s processed by the cluster.

In this post, we’ll switch our attention to Kafka Consumer clients to see how consumers interact with the brokers, coordinate their partitions, and send requests to read data from your Kafka topics.

Read on to see what it takes for a consumer to operate in Apache Kafka.

Comments closed

Transforming Queries Based on Human Intent

Andrei Lepikhov and Alena Rybakina ask a question:

As usual, this project was prompted by multiple user reports with typical complaints, like ‘SQL server executes the query times faster’ or ‘Postgres doesn’t pick up my index’. The underlying issue that united these reports was frequently used VALUES sequences, typically transformed in the query tree into an SEMI JOIN.

I also want to argue one general question: Should an open-source DBMS correct user errors? I mean optimising a query even before the search for an optimal plan begins, eliminating self-joins, subqueries, and simplifying expressions – everything that can be achieved by proper query tuning. The question is not that simple since DBAs point out that the cost of query planning in Oracle overgrows with the complexity of the query text, which is most likely caused, among other things, by the extensive range of optimisation rules.

My short answer is, yes. SQL is a 4th generation language, meaning that end users describe the results they need but leave it to the engine to determine how to get there. As performance tuners, we may understand some of the foibles of the database engine and how it does (or does not) perform these translations, but in an ideal world, every unique representation of an end state for a given query should have the same, maximally optimized internal way of getting there. This is impossible in practice, but it should be a guiding principle for engine behavior.

Comments closed

The Internals of Data Updates in PostgreSQL

Cary Huang explains how update operations work:

In previous blogs, we talked about an overview of PostgreSQL’s table access method API here , how sequential scan is handled within this API here, and how data insertion is handled here. Today in this blog, we will look closely into how PostgreSQL handles update. A successful update in PostgreSQL can be viewed as “insert a new record” while “marking the old record as invisible” due to the MVCC technique that PostgreSQL employs. It sounds simple enough, but there are quite a lot of considerations in place to make a successful update. Let’s dive in.

There’s a lot going on behind the scenes, and Cary does a good job of explaining it all.

Comments closed

Kafka Internals: Handling a Producer Request

Danica Fine continues a series on Kafka internals:

Welcome to the second installment of our blog series to understand the inner workings of the beautiful black box that is Apache Kafka®. 

We’re diving headfirst into Kafka to see how we actually interact with the cluster through producers and consumers. Along the way, we explore the configurations that affect each step of this epic journey and the metrics that we can use to more effectively monitor the process. 

In the last blog, we explored what the Kafka producer client does behind the scenes each time we call producer.send() (or similar, depending on your language of choice). In this post, we follow our brave hero, a well-formed produce request, that’s on its way to the broker to be processed and have its data stored on the cluster.

Click through to learn more about how it all works.

Comments closed

Self-Joins and Halloween Protection

Paul White has an explanation:

I was asked recently why Halloween Protection was needed for data modification statements that include a self-join of the target table. This gives me a chance to explain, while also covering some interesting product bug history from the SQL Server 7 and 2000 days.

Read on for that explanation.

Paul’s explanation of the bug reminded me of the “quirky update” approach to building a running total, except that, instead of fixing a bug that eliminated it, the process always remained on a knife’s edge of “unsupported but works…at least until we change something and it doesn’t work anymore.”

Comments closed

The State of the ANY Aggregate Transformation

Paul White covers an aggregate operator:

SQL Server provides a way to select any one row from a group of rows, provided you write the statement using a specific syntax. This method returns any one row from each group, not the minimum, maximum or anything else. In principle, the one row chosen from each group is unpredictable.

The general idea of the required syntax is to logically number rows starting with 1 in each group in no particular order, then return only the rows numbered 1. The outer statement must not select the numbering column for this query optimizer transformation (SelSeqPrjToAnyAgg) to work.

Read on for information about this internal operator, a bug that existed in it for a long time, and the current state of fixes.

Comments closed