Press "Enter" to skip to content

Curated SQL Posts

A Verbose Pipe Operator for dplyr Pipelines

Guillaume Pressiat shows off logrittr:

In SAS, every DATA step prints a log:

NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 7153 observations were deleted.
NOTE: The data set WORK.RESULT has 112847 observations and 11 variables.

R’s dplyr pipelines are silent. logrittr fills that gap with %>=%, a drop-in pipe that logs row counts, column counts, added/dropped columns, and timing at every step, with no function masking.

Click through to see how logrittr helps. Back when I was using R heavily, I would have really enjoyed this package. H/T R-Bloggers.

Leave a Comment

Materialized Tables in Apache Flink

Robin Moffatt digs into a neat feature in Apache Flink:

Flink added support for what it calls Materialized Tables in 1.20, released in 2024. You can read about the design and motivations in FLIP-435. In a nutshell, Materialized Tables provide a way to include the SQL to populate and refresh a table as part of its definition.

Let’s take a look!

Robin takes a deep dive into it, figures out several issues you might run into along the way, and provides a verdict at the end of the post. In addition, the GitHub repo includes a Docker Compose file you can use to follow along.

Leave a Comment

Finding Expensive Redshift Queries

Eduardo Pivaral searches for expensive queries:

Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is crucial to optimize resource usage and improve overall system efficiency.

My immediate answer was “all of them,” though in fairness, I’ve primarily needed to deal with situations in which people set up Redshift without using Kimball-style modeling and queried with (essentially) SELECT *.

Leave a Comment

The State of the Database Market

Simple Talk’s editor digs into the data:

Leading research and analysis firm Gartner recently revealed its DBMS Market Share Ranks for the 2011-2025 period, and it shows a clear pattern. That is: while the dominant database vendors are losing their stranglehold on the market, it’s happening very slowly – so don’t expect to see big changes at the top any time soon.

It’s a trend already uncovered in the Redgate DB-Engines rankings in recent times, despite it using a very different set of metrics compared to Gartner’s analysis.

I had no idea that Redgate owned DB-Engines. But there’s some interesting information to come out of these results, especially because they come at the problem from very different angles.

Leave a Comment

Alerting on Long-Running SQL Queries and SQL Agent Jobs

Temidayo Omoniyi sends an e-mail:

Have you ever waited for an eternity, waiting for either a query or SQL Agent Jobs to run? This is something most Data Warehouse Developers face daily.

Click through to see how you can use database mail to track long-running tasks. My primary hang-up with solutions like this is, what are you going to do about the e-mail? If there is no concrete action you can take, the most likely outcome will be to ignore the e-mail. This makes it harder to sift out the true positive you need to look into versus the false positives that happen every day.

Leave a Comment

Fabric Eventstreams SQL Operator now GA

Vaibhav Shrivastava makes an announcement:

The SQL operator was first introduced in preview to give customers an early look at a code‑first transformation experience in Fabric Eventstreams. During preview, customers used the SQL operator to simplify real-time pipelines, consolidate transformation logic, and unlock advanced scenarios using familiar SQL semantics. Feedback from this phase directly shaped the GA release—driving improvements across multi‑destination support, event‑time processing, testing capabilities, and overall production readiness.

Building on this preview momentum, we’ve reached the next milestone for SQL operator in Fabric Eventstreams, a powerful, code‑first way to transform and route data across Fabric’s Real-Time Intelligence experiences. This complements Eventstream’s no-code capabilities, giving engineers the flexibility to choose the right abstraction for their scenarios.

With this release, you can define transformation and routing logic once using familiar SQL semantics and seamlessly deliver streaming results to multiple destinations in parallel—all from a single operator.

Read on to see what’s changed since the public preview.

Leave a Comment

Bulk Loading Data with mssql-python

Chad Callihan loads some data:

I’ve had some projects in the past that involved using Python to load data in SQL Server. It wasn’t unbearably slow, but it seemed like a process that could be faster. For that reason, a recent SQL Server blog post about bulk loading data with Python caught my eye. I decided to test out the new mssql-python 1.4.0 mentioned in that post and see how much of an impact it would make on loading speed.

Chad saw about a 10x improvement in performance. I’ve had some similar results in production environments. The mssql-python library is a legitimate improvement over the classic ODBC driver and pyodbc.

Leave a Comment

The Current State of Open Source

Gabriel Anhaia gives a rundown:

Something is breaking in open source. Not the code. The social contracts.

The open-source world isn’t collapsing from lack of funding or contributor burnout (though both are real). It’s fracturing because organizations that built empires on community-written software decided the original deal doesn’t work for them anymore. Forks are multiplying. Licenses are mutating. Lawyers are circling.

The volunteer maintainers who actually wrote most of this code? They’re stuck in the crossfire.

Read on for a variety of examples around sovereignty rules and some of the licensing shenanigans going on in major products. H/T Cristophe Pettus.

Leave a Comment

Thoughts on Database Projects in SSMS 22.4

Andy Brownsword shares some thoughts:

Last month in SSMS v22.4, we had the Database DevOps (preview) workload introduced which brings with it Database Projects. Last week I shared my favourite features for them which make deployments amazing.

But if you’ve tried them out in SSMS you might have noticed that not everything is present. It’s a preview, after all. So what can we do today?

Read on for that answer.

Leave a Comment