Press "Enter" to skip to content

Category: Internals

Postgres Internals: Database Clusters, Databases, and Tables

Semab Tariq begins a new series:

A database cluster is a collection of multiple databases managed by a single PostgreSQL server. It can be referred to as a data/base directory.

A database is a collection of database objects. Whereas a database object is a data structure used to store objects such as tables, views, indexes, extensions, Sequences functions, etc. In simple words, anything that we can create or store within a database is a database object

Read on to learn more about how Postgres lays out database files and tablespaces.

Leave a Comment

Hash Match and Stream Aggregate Operators in SQL Server

Andy Brownsword rounds up the usual suspects:

In the last post we looked at how TOP and MAX operators compared. We saw the execution plan for a MAX function used a Stream Aggregate operator which is one of two which we can use for aggregation

I wanted to look at the two operators and how they perform the same tasks in different ways. The way they function is key to understanding why the engine may choose to use one over the other and the impact this can have on the performance of a query.

The two operators in question: the Hash Match (Aggregate) and Stream Aggregate

Read on for a discussion of how each operator works and when each makes sense for the optimizer to use.

Leave a Comment

Hash Aggregates and Hash Joins in Postgres

Muhammad Ali plays matchmaker:

PostgreSQL employs various techniques for data joining and aggregation in its queries, among which the hash-based method stands out for its efficiency in particular situations and different data sizes. We will discuss hash joins and hash aggregates in PostgreSQL, providing insights on how they work and parameters which influence this algorithm.

Read on to learn more. This looks fundamentally similar to hash matches in SQL Server, so if you’re familiar with that, the concepts should be pretty clear.

Comments closed

Fast-Track Optimization and Window Functions

Hugo Kornelis digs into a performance improvement the SQL Server optimizer has for window functions:

This is part twenty-four of the plansplaining series. In the previous part, I explained the execution plans for basic window functions, with and without a window frame. Especially the latter group performed quite poorly in the examples. So let’s now look at an optimization that SQL Server can apply to most cases, that prevents this rather bad scaling.

Click through to see what this is, how it works, and when the optimizer is able to use it.

Comments closed

Execution Plans for Window Functions

Hugo Kornelis digs in:

In his invitation, Steve specifically asks for examples where Window functions provided a neat solution to a real world problem. Well, sorry Steve, but I am not going to do that. But your invitation did inspire to me write about the execution plans for these window functions. And there is, in fact, so much to write about it, that this is just the first part.

So that makes this post not only a T-SQL Tuesday contribution, but also part 23 in my ongoing plansplaining series: blog posts where I take an in depth look at execution plans to explain how exactly they work, and point out often overlooked details. In this post, I will look at the basics of window functions, as they have existed for over 10 years now. I will point out a few interesting performance gotchas. And there are some links to feedback items that you can upvote (if you are so inclined) to pressure Microsoft to make some changes to the optimizer and the execution engine, to get some small but still welcome performance gains.

Read on to get an in-depth understanding of how those window functions actually operate once the database engine gets ahold of your query.

Comments closed

Why Batch Mode Sort Spills are Slow

Paul White unravels a mystery:

Batch mode sorting was added to SQL Server in the 2016 release under compatibility level 130. Most of the time, a batch mode sort will be much faster than the row mode equivalent.

This post is about an important exception to this rule, as recently reported by Erik Darling (video).

No doubt you’ll visit both links before reading on, but to summarize, the issue is that batch mode sorts are very slow when they spill—much slower than an equivalent row mode sort.

Read the whole thing. Paul does a great job illuminating us.

Comments closed

Packed/Batched Remote Procedure Calls in SQL Server

Bob Dorr digs in:

The SQL Server TDS protocol provides two main paths for query execution (Language and RPC events.)  You can trace these events using the Batch::Starting/Completed (Language) and RPC:Starting/Completed (RPC) XEvents.

Language events are text streams requiring full parsing and may be susceptible to injection attacks.  Language events also require educated guesses.  For example, should select 1 return a smallint, bigint?

Bob goes on to show an example of a simple call being susceptible to SQL injection, explains why Remote Procedure Calls (RPCs) are superior, and what packing (or batching) RPCs does for you.

Comments closed

Fast Key Optimization in SQL Server

Paul White explains the performance benefits of fast key optimization and when you can get them:

SQL Server can be called upon to sort a variety of data types. To facilitate this, the sorting code normally calls out to a specific comparator to determine how two compared values should sort: lower, higher, or equal.

Although calling comparator code has low overhead, performing enough of them can cause noticeable performance differences.

To address this, SQL Server has always (since at least version 7) supported a fast key optimization for simple data types. 

Click through to learn which data types support fast key optimization and to gain a feeling of the performance impact.

Comments closed

The Internals of Backup Compression

Andy Yun continues a series on how backups work in SQL Server:

Welcome back to Part 4 of my Backup Internals series. Today, I’d like to spend a little time exploring backup compression.

When you take a regular FULL BACKUP, SQL Server is literally taking a byte-for-byte copy of your data files. Don’t believe me? Then go read this, then come back. Additionally, Microsoft architected BACKUP operations such that the resource utilization and impact would be minimal (when using default parameters).

This post taught me a few things about the practical impact of enabling compression. Even after reading this, however, I would almost always enable it for two reasons. First, storage is usually the bottleneck for organizations, so actions which reduce storage utilization can improve overall performance. Second, there are limits to how much we can store, so compressing backups may let me get away with holding more backups on a given LUN or drive.

Comments closed

Lost Updates with RCSI

Kendra Little shares a warning:

There are two isolation levels in SQL Server that use optimistic locking for disk-based tables:

  1. Read Committed Snapshot Isolation (RCSI), which changes the implementation of the default Read Committed Isolation level and enables statement-based consistency.
  2. Snapshot Isolation, which provides high consistency for transactions (which often contain multiple statements). Snapshot Isolation also provides support for identifying update conflicts.

Many folks get pretty nervous about RCSI when they learn that certain timing effects can happen with data modifications that don’t happen under Read Committed. The irony is that RCSI does solve many OTHER timing risks in Read Committed, and overall is more consistent, so sticking with the pessimistic implementation of Read Committed is not a great solution, either.

I don’t recall getting any kinds of update errors with RCSI and I’ve used it in some pretty heavy workloads.

Comments closed