Internals – Page 4 – Curated SQL

Vacuum, MERGE, and ON CONFLICT Directives

Published 2024-09-06 by Kevin Feasel

I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. You can read the original blog here: Reduce Vacuum by Using “ON CONFLICT” Directive

Now that Postgres has incorporated the “MERGE” functionality into Postgres 15 and above, I wanted to ensure that there was no “strange” behavior as it relates to vacuum when using merge.

Read on to see how things look now.

Comments closed

Search in a Vector Database

Published 2024-09-05 by Kevin Feasel

Brendan Tierney continues a series on vector databases:

Searching semantic similarity in a data set is now equivalent to searching for nearest neighbors in a vector space instead of using traditional keyword searches using query predicates. The distance between “dog” and “wolf” in this vector space is shorter than the distance between “dog” and “kitten”. A “dog” is more similar to a “wolf” than it is to a “kitten”.

Click through to learn more about some of the common techniques for performing similarity searches against vectorized data.

Comments closed

Performance Profile of Fast-Forward Cursors

Published 2024-09-03 by Kevin Feasel

Hugo Kornelis continues a deep dive into cursors:

One of the things that has always bothered me about the fast forward cursor type is the shocking lack of documentation of what it does exactly. Okay, the name suggests that it is fast. But is it really? When I first looked at cursor performance (granted, a long time ago), I found that a static cursor was actually faster than a fast forward cursor for the same query. So… fastish forward?

The name also suggests that this cursor is forward only. That is indeed the case. At least they got that right in the naming.

Read on to learn more about this cursor property, as well as how it compares to static and dynamic cursors. Hugo ends on a spicy take you won’t want to miss.

Comments closed

Write-Ahead Logging in PostgreSQL

Published 2024-08-21 by Kevin Feasel

Semab Tariq continues a series on internals by covering write-ahead logging:

Write-Ahead Logging (WAL) is a critical feature in PostgreSQL that ensures data integrity and durability. It records changes to the database before they are applied, allowing PostgreSQL to recover data and restore the database to its most recent state in case of a crash or failure.

Before the introduction of Write-Ahead Logging (WAL) in PostgreSQL, PostgreSQL relied on a simpler mechanism for ensuring data integrity, which was less robust and did not support advanced features like point-in-time recovery and replication.

Read on for a bit of history, how write-ahead logging has changed over the years, and the basics of how it works. Write-ahead logging is a common feature in major relational databases and serves a critical purpose for database consistency. There are also some sneaky performance gains you can get out of it.

Comments closed

Query Re-Optimization in Postgres

Published 2024-08-20 by Kevin Feasel

Andrei Lepikhov walks through an interesting scenario:

What was the impetus to begin this work? It was caused by many real cases that may be demonstrated clearly by the Join Order Benchmark. How much performance do you think Postgres loses if you change its preference of employing parallel workers from one to zero? Two times regression? What about 10 or 100 times slower?

The black line in the graph below shows the change in execution time of each query between two cases: with parallel workers disabled and with a single parallel worker per gather allowed. For details, see the test script and EXPLAINs, with and without parallel workers.

Click through for an overview of what Andrei wrote, including architectural notes. But stick around until the end to see just how difficult the challenge is to re-optimize without making performance worse in the end.

Comments closed

Dynamic Cursors in SQL Server

Published 2024-06-04 by Kevin Feasel

Hugo Kornelis continues a series on cursors:

We’re already at part 31 of the plansplaining series. And this is also the third part in my discussion of execution plans for cursors. After explaining the basics, and after diving into static cursors, it is now time to investigate dynamic cursors. As a quick reminder, recall that a static cursor presents data as it was when the cursor was opened (and does so by simply saving a snapshot of that data in tempdb), whereas a dynamic cursor is supposed to see all changes that are committed while the cursor is open. Let’s see how this change in semantics affects the execution plan.

Read on as Hugo gives it the college try and also admits he might be missing something in the explanation.

Comments closed

Impossible Execution Plan Timings

Published 2024-06-03 by Kevin Feasel

Paul White puts up an article:

I showed a hidden option to make all operators report only their individual times in

More Consistent Execution Plan Timings in SQL Server 2022

. That feature isn’t complete yet, so the results aren’t perfect, and it’s not documented or supported.

I mention all that in case you are interested in the background. None of the foregoing explains what we see in this mixed mode plan. The row mode Gather Streams elapsed time ought to include its children. The batch mode Sort should just be reporting its own elapsed time.

With that understanding in mind, there’s no way the Sort could run for longer than the Gather Streams. What’s going on here?

Read on for a Paul White-level discussion of the topic, including a demo from Erik Darling.

Comments closed

The Postgres Portal: Executor and Process Utility

Published 2024-05-21 by Kevin Feasel

Cary Huang digs into Postgres:

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog. There is one that focuses primarily on the planner module that you can find here. In this blog, we will focus on the executor part of the query processing, which is the entry point to “communicate” with other PostgreSQL internal modules to collect the correct data for the query.

In fact, executor may not be the best term to describe this stage. In the PostgreSQL source code, it is actually “packaged” in another object called “portal”. Often, and it has 2 potential paths to take, the “executor” path, or the “process utility” path. Very often, we just call this stage as “executor” rather than “portal”, because executor handles most of the DML query types that involve SELECT, INSERT, UPDATE, DELETE…etc and it has to process them according to the query plan created by the “planner” module.

Read on to learn more about the portal and its two component parts.

Comments closed

Optimized Locking in Azure SQL Database

Published 2024-05-14 by Kevin Feasel

Bob Pusateri locks the door:

One of the newer features introduced in Azure SQL Database is “Optimized Locking”. But what does that mean, and how does it compare to the apparently “un-optimized” locking the SQL relational engine has been using up to this point? Let’s dive in and take a look.

Read on to learn more about optimized locking and how it compares to the traditional form of locking available in SQL Server today.

Comments closed

A Deep Dive into Covering and Non-Covering Indexes

Published 2024-05-08 by Kevin Feasel

Etienne Lopes goes deeper than I thought:

A few days ago a client brought me a question, he had two tables with the same data, one in each database (same SQL version and compatibility level in different environments with similar infrastructure and configurations). Both tables had an extremely selective column and both had indexes in that column. The thing is that when running a particular query on one Database, let’s call it DB1 here, it was really fast whereas running the same query on the other database (DB2) it was a lot slower. The subset of rows that matched the WHERE clause was the same and it was really small in both cases (since the tables had the same data).

“Ok, let’s have a look at that”, I said, and when doing so, the query running on DB1 was doing a Clustered Index Seek whereas the same query in DB2 was doing a Table Scan. Didn’t I say that both tables had indexes in that column?

Read on for the post. Etienne warns that it’s a big boy of a post, and it covers a lot, including “seek” operations that are scans in disguise, index levels, and more.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Internals