Internals – Curated SQL

Working with the Schema Registry in Confluent

Published 2024-04-11 by Kevin Feasel

Italo Nesi shows off the schema registry:

If you are new to Schema Registry or don’t know the difference between schema, schema type, subject, compatibility type, schema ID, and subject version, I would recommend starting with this free course: Schema Registry 101 by Danica Fine.
This article will show the bits and bytes of what happens behind the scenes in Apache Kafka® producer and consumer clients when communicating with the Schema Registry and serializing/deserializing messages.

Read on to learn more about data quality rules and how the schema registry works.

Page Structure in Postgres

Published 2024-03-25 by Kevin Feasel

Semab Tariq continues a series on Postgres internals:

Welcome to the second part of our PostgreSQL internals blog series. In the first part, we delved into database clusters and their physical structure.
In part two, we’ll explore the internal structure of a page in PostgreSQL. If you still need to check out part one, you can find it here.

A lot of this is similar to SQL Server, though it’s not quite the same.

Comments closed

An Overview of Postgres’s Planner

Published 2024-03-25 by Kevin Feasel

Cary Huang digs into one phase of Postgres query processing:

When you send a query to PostgreSQL, it normally would go through stages of query processing and return you the results at the end. These stages are known as:
Parse
Analyze
Rewrite
Plan
Execute
I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog, we will only focus on the “plan” stage or the “planner” module as this is perhaps the most interesting or complex stage if you will. I will share my understanding of the planner module as I investigate its internal workings to handle a simple sequential scan. This will be based on PostgreSQL 16.

Read on to learn what the planner does and how it works, at a high level.

Comments closed

Postgres Internals: Database Clusters, Databases, and Tables

Published 2024-03-18 by Kevin Feasel

Semab Tariq begins a new series:

A database cluster is a collection of multiple databases managed by a single PostgreSQL server. It can be referred to as a data/base directory.
A database is a collection of database objects. Whereas a database object is a data structure used to store objects such as tables, views, indexes, extensions, Sequences functions, etc. In simple words, anything that we can create or store within a database is a database object

Read on to learn more about how Postgres lays out database files and tablespaces.

Comments closed

Hash Match and Stream Aggregate Operators in SQL Server

Published 2024-03-13 by Kevin Feasel

Andy Brownsword rounds up the usual suspects:

In the last post we looked at how TOP and MAX operators compared. We saw the execution plan for a MAX function used a Stream Aggregate operator which is one of two which we can use for aggregation
I wanted to look at the two operators and how they perform the same tasks in different ways. The way they function is key to understanding why the engine may choose to use one over the other and the impact this can have on the performance of a query.
The two operators in question: the Hash Match (Aggregate) and Stream Aggregate

Read on for a discussion of how each operator works and when each makes sense for the optimizer to use.

Comments closed

Hash Aggregates and Hash Joins in Postgres

Published 2024-01-19 by Kevin Feasel

Muhammad Ali plays matchmaker:

PostgreSQL employs various techniques for data joining and aggregation in its queries, among which the hash-based method stands out for its efficiency in particular situations and different data sizes. We will discuss hash joins and hash aggregates in PostgreSQL, providing insights on how they work and parameters which influence this algorithm.

Read on to learn more. This looks fundamentally similar to hash matches in SQL Server, so if you’re familiar with that, the concepts should be pretty clear.

Comments closed

Fast-Track Optimization and Window Functions

Published 2023-11-24 by Kevin Feasel

Hugo Kornelis digs into a performance improvement the SQL Server optimizer has for window functions:

This is part twenty-four of the plansplaining series. In the previous part, I explained the execution plans for basic window functions, with and without a window frame. Especially the latter group performed quite poorly in the examples. So let’s now look at an optimization that SQL Server can apply to most cases, that prevents this rather bad scaling.

Click through to see what this is, how it works, and when the optimizer is able to use it.

Comments closed

Execution Plans for Window Functions

Published 2023-11-16 by Kevin Feasel

Hugo Kornelis digs in:

In his invitation, Steve specifically asks for examples where Window functions provided a neat solution to a real world problem. Well, sorry Steve, but I am not going to do that. But your invitation did inspire to me write about the execution plans for these window functions. And there is, in fact, so much to write about it, that this is just the first part.
So that makes this post not only a T-SQL Tuesday contribution, but also part 23 in my ongoing plansplaining series: blog posts where I take an in depth look at execution plans to explain how exactly they work, and point out often overlooked details. In this post, I will look at the basics of window functions, as they have existed for over 10 years now. I will point out a few interesting performance gotchas. And there are some links to feedback items that you can upvote (if you are so inclined) to pressure Microsoft to make some changes to the optimizer and the execution engine, to get some small but still welcome performance gains.

Read on to get an in-depth understanding of how those window functions actually operate once the database engine gets ahold of your query.

Comments closed

Why Batch Mode Sort Spills are Slow

Published 2023-11-13 by Kevin Feasel

Paul White unravels a mystery:

Batch mode sorting was added to SQL Server in the 2016 release under compatibility level 130. Most of the time, a batch mode sort will be much faster than the row mode equivalent.
This post is about an important exception to this rule, as recently reported by Erik Darling (video).
No doubt you’ll visit both links before reading on, but to summarize, the issue is that batch mode sorts are very slow when they spill—much slower than an equivalent row mode sort.

Read the whole thing. Paul does a great job illuminating us.

Comments closed

Packed/Batched Remote Procedure Calls in SQL Server

Published 2023-10-27 by Kevin Feasel

Bob Dorr digs in:

The SQL Server TDS protocol provides two main paths for query execution (Language and RPC events.) You can trace these events using the Batch::Starting/Completed (Language) and RPC:Starting/Completed (RPC) XEvents.
Language events are text streams requiring full parsing and may be susceptible to injection attacks. Language events also require educated guesses. For example, should select 1 return a smallint, bigint?

Bob goes on to show an example of a simple call being susceptible to SQL injection, explains why Remote Procedure Calls (RPCs) are superior, and what packing (or batching) RPCs does for you.

Comments closed

Category: Internals

Working with the Schema Registry in Confluent

Page Structure in Postgres

An Overview of Postgres’s Planner

Postgres Internals: Database Clusters, Databases, and Tables

Hash Match and Stream Aggregate Operators in SQL Server

Hash Aggregates and Hash Joins in Postgres

Fast-Track Optimization and Window Functions

Execution Plans for Window Functions

Why Batch Mode Sort Spills are Slow

Packed/Batched Remote Procedure Calls in SQL Server

Language events are text streams requiring full parsing and may be susceptible to injection attacks. Language events also require educated guesses. For example, should select 1 return a smallint, bigint?