Press "Enter" to skip to content

Author: Kevin Feasel

Kafka Consumer Offset Changes with KIP-1094

Alieh Saeedi looks at a change in Apache Kafka 4.0.0:

Consumer offsets are at the heart of Apache Kafka®’s robust data handling capabilities, as they determine how data is consumed, reprocessed, or skipped across topics and partitions. In this comprehensive guide, we delve into the intricacies of Kafka offsets, covering everything from the necessity of manual offset control to the nuanced challenges posed by offset management in distributed environments. We further explore the solutions and enhancements introduced by KIP-1094 (available in Kafka 4.0.0), offering a closer look at how it addresses these challenges by enabling more accurate and reliable offset and leader epoch information retrieval.

Click through for an overview of how consumer behavior works, as well as what KIP-1094 does.

Comments closed

When to Use a Python Notebook vs Spark Notebook in Microsoft Fabric

Gilbert Quevauvilliers lays out the plan:

This is the first blog post in a series of blog posts where I dive into how to use Python notebooks instead of Spark notebooks. For example, I will show you how to run a SQL query from a Lakehouse table and get it into a data frame. Read and write to a Lakehouse table and more.

NOTE: This is still in preview, but I personally think that this is worth investing time in learning.

The reason I am using the term Python is because the notebook can ONLY use Python and not any of the other languages available in a Spark

Also, in fairness, I’ve heard people working on Microsoft Fabric within the company reference these as ‘Python notebooks,’ so Gilbert is in good company.

Comments closed

Session-Scoped Temp Tables in Microsoft Fabric now GA

Twinkle Cyril gets something GA:

Introducing distributed session-scoped temporary (#temp) tables in Fabric Data Warehouse and Fabric Lakehouse SQL Endpoints.

#temp tables have been a feature of Microsoft SQL Server (and other database systems) for many years. In the current implementation of Fabric data warehouse, #temp tables are session scoped or local temp tables. Global temp tables are not included in this release.

Session-scoped #temp tables exist only within the session in which they are created and last only for the duration of that session. They are not visible to other users or sessions and are automatically dropped from the system once the session ends or the user decides to drop the temp table. These tables are accessible to all users without requiring specific artifact-level permission.

Click through for examples of how it works and how you can specify a session-level temp table over a local temp table.

Comments closed

Restoring a Single Data Page in SQL Server

Stephen Planck turns the page:

Most of the time, corruption in SQL Server is either nonexistent or so widespread that you have no choice but to perform a file or full‑database restore. Yet an awkward middle ground exists: a handful of pages—perhaps only one—become unreadable while the rest of the database remains perfectly healthy. A full restore would repair the damage, but at the cost of rolling back hours of work and locking users out of an otherwise functional system.

That is precisely why Microsoft built RESTORE … PAGE. When you meet a short list of prerequisites (FULL or BULK_LOGGED recovery model, an unbroken backup chain, and a page that is not allocation metadata), you can surgically overwrite just the bad 8‑KB chunks, roll them forward with transaction‑log backups, and return the database to service in minutes rather than hours.

Read on to see how it all works, as well as situations in which this isn’t the right answer.

Comments closed

Expression Reordering in PostgreSQL

Andrei Lepikhov speeds up a query:

Occasionally, you may come across queries featuring complex filters similar to the following:

SELECT * FROM table
WHERE
  date > min_date AND
  date < now() - interval '1 day' AND
  value IN Subplan AND
  id = 42';

And in practice, it happens that a simple rearrangement of the order of conditions in such an expression allows for speeding up (sometimes quite notably) the query execution time. Why?

Read on for the answer. In a perfect world, SQL is a 4th generation language and the order of operations should make zero difference for query performance. In practice, as Andrei shows, this is a challenge for the developers of the relational databases we use.

Comments closed

Pitfalls in Software Testing

Ngọc lê builds a list:

Software testing plays a crucial role in ensuring the delivery of high-quality products. However, even experienced testers can fall into common traps that compromise the effectiveness of testing processes, allowing defects to slip into production. Avoiding these pitfalls is essential for maintaining the reliability and functionality of software. In this blog, we’ll explore some of the most common software testing mistakes and provide strategies to overcome them.

This is specifically for software testing, but most of these principles apply the same for database work.

Comments closed

Comparing Microsoft Fabric to Snowflake

Evanjalin Joseph lays out a comparison:

Take ShopSmart, a global retail chain that operates both online and offline. The company wants to combine its sales, inventory, and customer data in order to facilitate real-time reporting and predictive analytics. Two top platforms are being assessed by the IT team for this change.

Azure, Power BI, and Microsoft 365 are already widely used by ShopSmart, which is in line with Fabric’s integrated ecosystem. The alternative, however, provides more multi-cloud flexibility and strong performance on structured data. The group has to choose between selecting a more specialized warehousing solution with more deployment options or making use of its current Microsoft investments.

Let’s examine the differences between the two platforms.

Click through for an overview of each platform and how they stack up against one another.

Comments closed

Digging into the Microsoft Fabric Monitoring Hub

Reitse Eskens shows off Microsoft Fabric’s Monitor hub:

When we create a data solution for our company or clients, we create the best, right? Sure, we do. But still, monitoring the solution is something you should never forget. Your solution can be stable, but the data usually isn’t. And it just feels good to know all the data has been processed correctly.

So, let’s dig into monitoring Microsoft Fabric.

Read on to see how it works, as well as plenty of thoughts, advice, and critique from Reitse.

Comments closed