Internals – Page 9 – Curated SQL

PostgreSQL Row Visibility Indicators

Published 2024-04-23 by Kevin Feasel

Cary Huang explains how row visibility works in Postgres:

Simply put, the visibility refers to whether a row of data (Heap Tuple by default) should be displayed to the user in certain states, backend processes, or transactions.

For example,

A user changes a data record from ‘A’ to ‘B’ via UPDATE

PostgreSQL handles this by INSERTing a new record call ‘B’ while marking record ‘A’ as ‘invisible’

Both records exist in the system, but ‘B’ is visible, while ‘A’ is not.

A deleted or invisible record is also referred as a ‘dead’ tuple

One of the responsibilities of VACUUM process is to clear these ‘dead’ tuples to free up space.

Read on to learn more about various visibility indicators and how they ultimately tell us whether a row should be visible or not.

Comments closed

Processes in PostgreSQL

Published 2024-04-19 by Kevin Feasel

Semab Tariq continues a series on internals in PostgreSQL:

PostgreSQL is a client/server type relational database management system. It has a multi-process architecture that runs on a single host. A collection of multiple processes that manage a database cluster is usually referred to as a PostgreSQL server. In PostgreSQL, every operation is treated as a process, and each action we undertake within PostgreSQL follows an append-only approach. This means that every time we execute an action such as an insert, update, or delete, a new tuple is created rather than modifying the existing data directly.

Consequently, PostgreSQL does not execute updates or deletes in place. Instead, it appends new data or marks existing data as obsolete. This append-only methodology ensures data integrity and allows for efficient management of database changes over time.

Read on to learn more about how these processes work.

Comments closed

Working with the Schema Registry in Confluent

Published 2024-04-11 by Kevin Feasel

Italo Nesi shows off the schema registry:

If you are new to Schema Registry or don’t know the difference between schema, schema type, subject, compatibility type, schema ID, and subject version, I would recommend starting with this free course: Schema Registry 101 by Danica Fine.

This article will show the bits and bytes of what happens behind the scenes in Apache Kafka® producer and consumer clients when communicating with the Schema Registry and serializing/deserializing messages.

Read on to learn more about data quality rules and how the schema registry works.

Comments closed

Page Structure in Postgres

Published 2024-03-25 by Kevin Feasel

Semab Tariq continues a series on Postgres internals:

Welcome to the second part of our PostgreSQL internals blog series. In the first part, we delved into database clusters and their physical structure.

In part two, we’ll explore the internal structure of a page in PostgreSQL. If you still need to check out part one, you can find it here.

A lot of this is similar to SQL Server, though it’s not quite the same.

Comments closed

An Overview of Postgres’s Planner

Published 2024-03-25 by Kevin Feasel

Cary Huang digs into one phase of Postgres query processing:

When you send a query to PostgreSQL, it normally would go through stages of query processing and return you the results at the end. These stages are known as:

Parse

Analyze

Rewrite

Plan

Execute

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog, we will only focus on the “plan” stage or the “planner” module as this is perhaps the most interesting or complex stage if you will. I will share my understanding of the planner module as I investigate its internal workings to handle a simple sequential scan. This will be based on PostgreSQL 16.

Read on to learn what the planner does and how it works, at a high level.

Comments closed

Postgres Internals: Database Clusters, Databases, and Tables

Published 2024-03-18 by Kevin Feasel

Semab Tariq begins a new series:

A database cluster is a collection of multiple databases managed by a single PostgreSQL server. It can be referred to as a data/base directory.

A database is a collection of database objects. Whereas a database object is a data structure used to store objects such as tables, views, indexes, extensions, Sequences functions, etc. In simple words, anything that we can create or store within a database is a database object

Read on to learn more about how Postgres lays out database files and tablespaces.

Comments closed

Hash Match and Stream Aggregate Operators in SQL Server

Published 2024-03-13 by Kevin Feasel

Andy Brownsword rounds up the usual suspects:

In the last post we looked at how TOP and MAX operators compared. We saw the execution plan for a MAX function used a Stream Aggregate operator which is one of two which we can use for aggregation

I wanted to look at the two operators and how they perform the same tasks in different ways. The way they function is key to understanding why the engine may choose to use one over the other and the impact this can have on the performance of a query.

The two operators in question: the Hash Match (Aggregate) and Stream Aggregate

Read on for a discussion of how each operator works and when each makes sense for the optimizer to use.

Comments closed

Hash Aggregates and Hash Joins in Postgres

Published 2024-01-19 by Kevin Feasel

Muhammad Ali plays matchmaker:

PostgreSQL employs various techniques for data joining and aggregation in its queries, among which the hash-based method stands out for its efficiency in particular situations and different data sizes. We will discuss hash joins and hash aggregates in PostgreSQL, providing insights on how they work and parameters which influence this algorithm.

Read on to learn more. This looks fundamentally similar to hash matches in SQL Server, so if you’re familiar with that, the concepts should be pretty clear.

Comments closed

Fast-Track Optimization and Window Functions

Published 2023-11-24 by Kevin Feasel

Hugo Kornelis digs into a performance improvement the SQL Server optimizer has for window functions:

This is part twenty-four of the plansplaining series. In the previous part, I explained the execution plans for basic window functions, with and without a window frame. Especially the latter group performed quite poorly in the examples. So let’s now look at an optimization that SQL Server can apply to most cases, that prevents this rather bad scaling.

Click through to see what this is, how it works, and when the optimizer is able to use it.

Comments closed

Execution Plans for Window Functions

Published 2023-11-16 by Kevin Feasel

Hugo Kornelis digs in:

In his invitation, Steve specifically asks for examples where Window functions provided a neat solution to a real world problem. Well, sorry Steve, but I am not going to do that. But your invitation did inspire to me write about the execution plans for these window functions. And there is, in fact, so much to write about it, that this is just the first part.

So that makes this post not only a T-SQL Tuesday contribution, but also part 23 in my ongoing plansplaining series: blog posts where I take an in depth look at execution plans to explain how exactly they work, and point out often overlooked details. In this post, I will look at the basics of window functions, as they have existed for over 10 years now. I will point out a few interesting performance gotchas. And there are some links to feedback items that you can upvote (if you are so inclined) to pressure Microsoft to make some changes to the optimizer and the execution engine, to get some small but still welcome performance gains.

Read on to get an in-depth understanding of how those window functions actually operate once the database engine gets ahold of your query.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Internals