Press "Enter" to skip to content

Day: April 30, 2026

A Verbose Pipe Operator for dplyr Pipelines

Guillaume Pressiat shows off logrittr:

In SAS, every DATA step prints a log:

NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 7153 observations were deleted.
NOTE: The data set WORK.RESULT has 112847 observations and 11 variables.

R’s dplyr pipelines are silent. logrittr fills that gap with %>=%, a drop-in pipe that logs row counts, column counts, added/dropped columns, and timing at every step, with no function masking.

Click through to see how logrittr helps. Back when I was using R heavily, I would have really enjoyed this package. H/T R-Bloggers.

Leave a Comment

Materialized Tables in Apache Flink

Robin Moffatt digs into a neat feature in Apache Flink:

Flink added support for what it calls Materialized Tables in 1.20, released in 2024. You can read about the design and motivations in FLIP-435. In a nutshell, Materialized Tables provide a way to include the SQL to populate and refresh a table as part of its definition.

Let’s take a look!

Robin takes a deep dive into it, figures out several issues you might run into along the way, and provides a verdict at the end of the post. In addition, the GitHub repo includes a Docker Compose file you can use to follow along.

Leave a Comment

Finding Expensive Redshift Queries

Eduardo Pivaral searches for expensive queries:

Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is crucial to optimize resource usage and improve overall system efficiency.

My immediate answer was “all of them,” though in fairness, I’ve primarily needed to deal with situations in which people set up Redshift without using Kimball-style modeling and queried with (essentially) SELECT *.

Leave a Comment

The State of the Database Market

Simple Talk’s editor digs into the data:

Leading research and analysis firm Gartner recently revealed its DBMS Market Share Ranks for the 2011-2025 period, and it shows a clear pattern. That is: while the dominant database vendors are losing their stranglehold on the market, it’s happening very slowly – so don’t expect to see big changes at the top any time soon.

It’s a trend already uncovered in the Redgate DB-Engines rankings in recent times, despite it using a very different set of metrics compared to Gartner’s analysis.

I had no idea that Redgate owned DB-Engines. But there’s some interesting information to come out of these results, especially because they come at the problem from very different angles.

Leave a Comment

Alerting on Long-Running SQL Queries and SQL Agent Jobs

Temidayo Omoniyi sends an e-mail:

Have you ever waited for an eternity, waiting for either a query or SQL Agent Jobs to run? This is something most Data Warehouse Developers face daily.

Click through to see how you can use database mail to track long-running tasks. My primary hang-up with solutions like this is, what are you going to do about the e-mail? If there is no concrete action you can take, the most likely outcome will be to ignore the e-mail. This makes it harder to sift out the true positive you need to look into versus the false positives that happen every day.

Leave a Comment