Press "Enter" to skip to content

Author: Kevin Feasel

Materialized Tables in Apache Flink

Robin Moffatt digs into a neat feature in Apache Flink:

Flink added support for what it calls Materialized Tables in 1.20, released in 2024. You can read about the design and motivations in FLIP-435. In a nutshell, Materialized Tables provide a way to include the SQL to populate and refresh a table as part of its definition.

Let’s take a look!

Robin takes a deep dive into it, figures out several issues you might run into along the way, and provides a verdict at the end of the post. In addition, the GitHub repo includes a Docker Compose file you can use to follow along.

Leave a Comment

Finding Expensive Redshift Queries

Eduardo Pivaral searches for expensive queries:

Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is crucial to optimize resource usage and improve overall system efficiency.

My immediate answer was “all of them,” though in fairness, I’ve primarily needed to deal with situations in which people set up Redshift without using Kimball-style modeling and queried with (essentially) SELECT *.

Leave a Comment

The State of the Database Market

Simple Talk’s editor digs into the data:

Leading research and analysis firm Gartner recently revealed its DBMS Market Share Ranks for the 2011-2025 period, and it shows a clear pattern. That is: while the dominant database vendors are losing their stranglehold on the market, it’s happening very slowly – so don’t expect to see big changes at the top any time soon.

It’s a trend already uncovered in the Redgate DB-Engines rankings in recent times, despite it using a very different set of metrics compared to Gartner’s analysis.

I had no idea that Redgate owned DB-Engines. But there’s some interesting information to come out of these results, especially because they come at the problem from very different angles.

Leave a Comment

Alerting on Long-Running SQL Queries and SQL Agent Jobs

Temidayo Omoniyi sends an e-mail:

Have you ever waited for an eternity, waiting for either a query or SQL Agent Jobs to run? This is something most Data Warehouse Developers face daily.

Click through to see how you can use database mail to track long-running tasks. My primary hang-up with solutions like this is, what are you going to do about the e-mail? If there is no concrete action you can take, the most likely outcome will be to ignore the e-mail. This makes it harder to sift out the true positive you need to look into versus the false positives that happen every day.

Leave a Comment

Fabric Eventstreams SQL Operator now GA

Vaibhav Shrivastava makes an announcement:

The SQL operator was first introduced in preview to give customers an early look at a code‑first transformation experience in Fabric Eventstreams. During preview, customers used the SQL operator to simplify real-time pipelines, consolidate transformation logic, and unlock advanced scenarios using familiar SQL semantics. Feedback from this phase directly shaped the GA release—driving improvements across multi‑destination support, event‑time processing, testing capabilities, and overall production readiness.

Building on this preview momentum, we’ve reached the next milestone for SQL operator in Fabric Eventstreams, a powerful, code‑first way to transform and route data across Fabric’s Real-Time Intelligence experiences. This complements Eventstream’s no-code capabilities, giving engineers the flexibility to choose the right abstraction for their scenarios.

With this release, you can define transformation and routing logic once using familiar SQL semantics and seamlessly deliver streaming results to multiple destinations in parallel—all from a single operator.

Read on to see what’s changed since the public preview.

Leave a Comment

The Current State of Open Source

Gabriel Anhaia gives a rundown:

Something is breaking in open source. Not the code. The social contracts.

The open-source world isn’t collapsing from lack of funding or contributor burnout (though both are real). It’s fracturing because organizations that built empires on community-written software decided the original deal doesn’t work for them anymore. Forks are multiplying. Licenses are mutating. Lawyers are circling.

The volunteer maintainers who actually wrote most of this code? They’re stuck in the crossfire.

Read on for a variety of examples around sovereignty rules and some of the licensing shenanigans going on in major products. H/T Cristophe Pettus.

Leave a Comment

Bulk Loading Data with mssql-python

Chad Callihan loads some data:

I’ve had some projects in the past that involved using Python to load data in SQL Server. It wasn’t unbearably slow, but it seemed like a process that could be faster. For that reason, a recent SQL Server blog post about bulk loading data with Python caught my eye. I decided to test out the new mssql-python 1.4.0 mentioned in that post and see how much of an impact it would make on loading speed.

Chad saw about a 10x improvement in performance. I’ve had some similar results in production environments. The mssql-python library is a legitimate improvement over the classic ODBC driver and pyodbc.

Leave a Comment

Thoughts on Database Projects in SSMS 22.4

Andy Brownsword shares some thoughts:

Last month in SSMS v22.4, we had the Database DevOps (preview) workload introduced which brings with it Database Projects. Last week I shared my favourite features for them which make deployments amazing.

But if you’ve tried them out in SSMS you might have noticed that not everything is present. It’s a preview, after all. So what can we do today?

Read on for that answer.

Leave a Comment

Defining the Fabric Ontology

Mike Donnelly explains a term:

The short version: Fabric Ontology is the semantic backbone of Microsoft Fabric. It’s the layer that defines what your business data actually means.

If you’ve ever worked in a large organization, you know the problem. One team calls them “Customers,” another calls them “Clients,” and a third calls them “Account Holders.” Without a shared meaning, your analytics become a mess of conflicting vocabularies. An ontology is just a structured way of naming things and describing their relationships so everyone—and every tool—is using the same dictionary.

I think this is all correct, but I think there’s something more to ontologies than that. At least in the Palantir world, the ontology is not just the business definitions and concepts, but it’s also the actions you can take against that data. In other words, you might have Customers, Clients, and Account Holders. You can add a new customer, update the customer details, send a welcome to a new account holder, etc. Each of these actions is baked into the ontology as well. That’s what makes it different from simply defining where the data lives and how we got it in its current shape.

Leave a Comment