Press "Enter" to skip to content

Category: Internals

The Postgres Portal: Executor and Process Utility

Cary Huang digs into Postgres:

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog. There is one that focuses primarily on the planner module that you can find here. In this blog, we will focus on the executor part of the query processing, which is the entry point to “communicate” with other PostgreSQL internal modules to collect the correct data for the query.

In fact, executor may not be the best term to describe this stage. In the PostgreSQL source code, it is actually “packaged” in another object called “portal”. Often, and it has 2 potential paths to take, the “executor” path, or the “process utility” path. Very often, we just call this stage as “executor” rather than “portal”, because executor handles most of the DML query types that involve SELECT, INSERT, UPDATE, DELETE…etc and it has to process them according to the query plan created by the “planner” module.

Read on to learn more about the portal and its two component parts.

Comments closed

Optimized Locking in Azure SQL Database

Bob Pusateri locks the door:

One of the newer features introduced in Azure SQL Database is “Optimized Locking”. But what does that mean, and how does it compare to the apparently “un-optimized” locking the SQL relational engine has been using up to this point? Let’s dive in and take a look.

Read on to learn more about optimized locking and how it compares to the traditional form of locking available in SQL Server today.

Comments closed

A Deep Dive into Covering and Non-Covering Indexes

Etienne Lopes goes deeper than I thought:

A few days ago a client brought me a question, he had two tables with the same data, one in each database (same SQL version and compatibility level in different environments with similar infrastructure and configurations). Both tables had an extremely selective column and both had indexes in that column. The thing is that when running a particular query on one Database, let’s call it DB1 here, it was really fast whereas running the same query on the other database (DB2) it was a lot slower. The subset of rows that matched the WHERE clause was the same and it was really small in both cases (since the tables had the same data).

“Ok, let’s have a look at that”, I said, and when doing so, the query running on DB1 was doing a Clustered Index Seek whereas the same query in DB2 was doing a Table Scan. Didn’t I say that both tables had indexes in that column?

Read on for the post. Etienne warns that it’s a big boy of a post, and it covers a lot, including “seek” operations that are scans in disguise, index levels, and more.

Comments closed

SQL Server Drivers and Access Tokens for Connection Pooling

David Engel gives us a deep dive:

We often get support escalations related to failures around expired access tokens when using Microsoft Entra ID (formerly Azure Active Directory) authentication. There is a lot of nuance in the various drivers about how and when access tokens are used and when they get renewed after expiring. I’m going to do a deep-dive and try to explain some of the details here.

This post is mainly targeted at people who have to troubleshoot issues in this area or those who are building large services and need to understand the impact of their authentication choices.

Read on to learn more about how the ODBC, JDBC, and .NET drivers differ.

Comments closed

PostgreSQL Row Visibility Indicators

Cary Huang explains how row visibility works in Postgres:

Simply put, the visibility refers to whether a row of data (Heap Tuple by default) should be displayed to the user in certain states, backend processes, or transactions.

For example,

  • A user changes a data record from ‘A’ to ‘B’ via UPDATE
  • PostgreSQL handles this by INSERTing a new record call ‘B’ while marking record ‘A’ as ‘invisible’
  • Both records exist in the system, but ‘B’ is visible, while ‘A’ is not.
  • A deleted or invisible record is also referred as a ‘dead’ tuple
  • One of the responsibilities of VACUUM process is to clear these ‘dead’ tuples to free up space.

Read on to learn more about various visibility indicators and how they ultimately tell us whether a row should be visible or not.

Comments closed

Processes in PostgreSQL

Semab Tariq continues a series on internals in PostgreSQL:

PostgreSQL is a client/server type relational database management system. It has a multi-process architecture that runs on a single host. A collection of multiple processes that manage a database cluster is usually referred to as a PostgreSQL server. In PostgreSQL, every operation is treated as a process, and each action we undertake within PostgreSQL follows an append-only approach. This means that every time we execute an action such as an insert, update, or delete, a new tuple is created rather than modifying the existing data directly.

Consequently, PostgreSQL does not execute updates or deletes in place. Instead, it appends new data or marks existing data as obsolete. This append-only methodology ensures data integrity and allows for efficient management of database changes over time.

Read on to learn more about how these processes work.

Comments closed

Working with the Schema Registry in Confluent

Italo Nesi shows off the schema registry:

If you are new to Schema Registry or don’t know the difference between schema, schema type, subject, compatibility type, schema ID, and subject version, I would recommend starting with this free course: Schema Registry 101 by Danica Fine.

This article will show the bits and bytes of what happens behind the scenes in Apache Kafka® producer and consumer clients when communicating with the Schema Registry and serializing/deserializing messages.

Read on to learn more about data quality rules and how the schema registry works.

Comments closed

An Overview of Postgres’s Planner

Cary Huang digs into one phase of Postgres query processing:

When you send a query to PostgreSQL, it normally would go through stages of query processing and return you the results at the end. These stages are known as:

  • Parse
  • Analyze
  • Rewrite
  • Plan
  • Execute

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog, we will only focus on the “plan” stage or the “planner” module as this is perhaps the most interesting or complex stage if you will. I will share my understanding of the planner module as I investigate its internal workings to handle a simple sequential scan. This will be based on PostgreSQL 16.

Read on to learn what the planner does and how it works, at a high level.

Comments closed

Postgres Internals: Database Clusters, Databases, and Tables

Semab Tariq begins a new series:

A database cluster is a collection of multiple databases managed by a single PostgreSQL server. It can be referred to as a data/base directory.

A database is a collection of database objects. Whereas a database object is a data structure used to store objects such as tables, views, indexes, extensions, Sequences functions, etc. In simple words, anything that we can create or store within a database is a database object

Read on to learn more about how Postgres lays out database files and tablespaces.

Comments closed