Press "Enter" to skip to content

Day: April 23, 2025

Kafka Consumer Offset Changes with KIP-1094

Alieh Saeedi looks at a change in Apache Kafka 4.0.0:

Consumer offsets are at the heart of Apache Kafka®’s robust data handling capabilities, as they determine how data is consumed, reprocessed, or skipped across topics and partitions. In this comprehensive guide, we delve into the intricacies of Kafka offsets, covering everything from the necessity of manual offset control to the nuanced challenges posed by offset management in distributed environments. We further explore the solutions and enhancements introduced by KIP-1094 (available in Kafka 4.0.0), offering a closer look at how it addresses these challenges by enabling more accurate and reliable offset and leader epoch information retrieval.

Click through for an overview of how consumer behavior works, as well as what KIP-1094 does.

Leave a Comment

Comprehensions in Python

I have a new video:

In this video, I show how to use comprehensions in Python to generate lists, dictionaries, and sets. I also run a quick performance test, comparing a list comprehension to an equivalent for loop.

It can take a little bit of time to get used to the syntax, but once you do, comprehensions are quite powerful.

Leave a Comment

When to Use a Python Notebook vs Spark Notebook in Microsoft Fabric

Gilbert Quevauvilliers lays out the plan:

This is the first blog post in a series of blog posts where I dive into how to use Python notebooks instead of Spark notebooks. For example, I will show you how to run a SQL query from a Lakehouse table and get it into a data frame. Read and write to a Lakehouse table and more.

NOTE: This is still in preview, but I personally think that this is worth investing time in learning.

The reason I am using the term Python is because the notebook can ONLY use Python and not any of the other languages available in a Spark

Also, in fairness, I’ve heard people working on Microsoft Fabric within the company reference these as ‘Python notebooks,’ so Gilbert is in good company.

Leave a Comment

Session-Scoped Temp Tables in Microsoft Fabric now GA

Twinkle Cyril gets something GA:

Introducing distributed session-scoped temporary (#temp) tables in Fabric Data Warehouse and Fabric Lakehouse SQL Endpoints.

#temp tables have been a feature of Microsoft SQL Server (and other database systems) for many years. In the current implementation of Fabric data warehouse, #temp tables are session scoped or local temp tables. Global temp tables are not included in this release.

Session-scoped #temp tables exist only within the session in which they are created and last only for the duration of that session. They are not visible to other users or sessions and are automatically dropped from the system once the session ends or the user decides to drop the temp table. These tables are accessible to all users without requiring specific artifact-level permission.

Click through for examples of how it works and how you can specify a session-level temp table over a local temp table.

Leave a Comment

Expression Reordering in PostgreSQL

Andrei Lepikhov speeds up a query:

Occasionally, you may come across queries featuring complex filters similar to the following:

SELECT * FROM table
WHERE
  date > min_date AND
  date < now() - interval '1 day' AND
  value IN Subplan AND
  id = 42';

And in practice, it happens that a simple rearrangement of the order of conditions in such an expression allows for speeding up (sometimes quite notably) the query execution time. Why?

Read on for the answer. In a perfect world, SQL is a 4th generation language and the order of operations should make zero difference for query performance. In practice, as Andrei shows, this is a challenge for the developers of the relational databases we use.

Leave a Comment

Restoring a Single Data Page in SQL Server

Stephen Planck turns the page:

Most of the time, corruption in SQL Server is either nonexistent or so widespread that you have no choice but to perform a file or full‑database restore. Yet an awkward middle ground exists: a handful of pages—perhaps only one—become unreadable while the rest of the database remains perfectly healthy. A full restore would repair the damage, but at the cost of rolling back hours of work and locking users out of an otherwise functional system.

That is precisely why Microsoft built RESTORE … PAGE. When you meet a short list of prerequisites (FULL or BULK_LOGGED recovery model, an unbroken backup chain, and a page that is not allocation metadata), you can surgically overwrite just the bad 8‑KB chunks, roll them forward with transaction‑log backups, and return the database to service in minutes rather than hours.

Read on to see how it all works, as well as situations in which this isn’t the right answer.

Leave a Comment