Press "Enter" to skip to content

Day: June 19, 2025

From Pandas to Polars

Ivan Palomares Carrascosa provides an introduction to the polars library:

Polars is currently one of the fastest open-source libraries for data manipulation and processing on a single machine, featuring an intuitive and user-friendly API. Natively built in Rust, it is designed to optimize low memory consumption and speed while working with DataFrames.

This article takes a tour of Polars library in Python and illustrates how it can be seamlessly used similarly to Pandas to efficiently manipulate large datasets.

My experience with polars is that it’s not a 1:1 replacement for pandas, but the interfaces are similar enough that a lot of code can swap over without much effort. And yes, it’s typically faster.

Leave a Comment

Diskless Kafka in Practice

Hugh Evans lays it out:

I joined Aiven as a Developer Advocate in May, shortly after the Kafka Improvement Proposal KIP-1150: Diskless Topics was announced, which is reduces the total cost of ownership of Kafka by up to 80%! It was very exciting to join Aiven just as the streaming team were making this major contribution to open source but I wanted to take my time to understand the KIP before sharing my thoughts.

In this article I’ll share my first impressions of Diskless Kafka, walk you through a simple example you can use to experiment with Diskless, and highlight some of the great resources that are out there for learning about the topic. First though, what actually is Diskless Kafka?

Click through for that answer, as well as more.

Leave a Comment

Myths of the DBA-less Cloud

Kevin Hill has a reminder for us:

Here’s a common theme I hear from small IT teams:

“Our SQL Server is in the cloud now. We don’t need a DBA.”

Not quite.

When you’re running SQL Server on cloud virtual machines like Azure SQL VM or AWS EC2, you’re still responsible for a lot of the same challenges you faced on-prem. The difference? Now you’re paying cloud rates for the same headaches. But you don’t have to deal with hardware and some of the other infrastructure hassles.

Read on to see what that entails in practice. Though I’m pretty sure my target audience generally understands this and it’s people two or three levels up who should give Kevin’s post a read.

Leave a Comment

Building a Better Trigger

Steve Jones is done with mousetraps:

This seemed to work, but did it really?

The Problem

Let’s illustrate the big problem with this change. I’ll run this code:

Steve hit on one of the problems in trigger design, but there’s another problem that we often see. Steve’s code doesn’t have this problem, but quite often, people assume that rows update one at a time. But if you write a T-SQL UPDATE statement that affects five rows, the trigger fires once for all five. Steve correctly uses the inserted and deleted pseudo-tables to handle all changes, rather than just assuming a single row.

Leave a Comment

Thoughts on Multi-Primary Replication in PostgreSQL

Jan Wieremjewicz wants to know if you really need it:

Active-active, also referred to as multi-primary, is a setup where multiple database nodes can accept writes at the same time and propagate those changes to the others. In comparison, regular streaming replication in PostgreSQL allows only one node (the primary) to accept writes. All other nodes (replicas) are read-only and follow changes.

In an active-active setup:

  • There is no single point of write.
  • Applications can write to any node.
  • The database needs a way to sort out conflicts when two nodes try to concurrently change the same data.

Read on for some of the challenges around this, tooling that is available, and thoughts on when this can be important.

Leave a Comment

The New Item Creation Experience in Microsoft Fabric

Dan Liu has an announcement:

Have you ever found yourself frustrated by inconsistent item creation? Maybe you’ve struggled to select the right workspace or folder when creating a new item or ended up with a cluttered workspace due to accidental item creation.

We hear you—and we’re excited to introduce the new item creation experience in Fabric! This update is designed to address those pain points head-on. With the new unified creation flow, you’ll have a streamlined way to select the exact workspace and folder where your items belong. More importantly, this improvement brings consistency and cohesion to item creation across all Fabric items, so you can stay organized and productive.

Click through to see what it looks like.

Leave a Comment