Press "Enter" to skip to content

Author: Kevin Feasel

Queue-Style Batch Deletion Woes in PostgreSQL

Shayon Mukherjee runs into an interesting issue:

I recently discovered an unexpected behavior in PostgreSQL involving a pattern of using a Common Table Expression (CTE) with DELETE ... RETURNING and LIMIT to process a batch of items from a queue-like table. What seemed straightforward turned out to have a surprising interaction with the query planner.

Click through to see what happened, as well as Shayon’s recommendation on how to fix it.

Comments closed

Tips for Using Database Snapshots

Stephen Planck offers some advice:

Database snapshots are one of those features that’s been around forever, but still solves real-world problems with very little setup. In a single statement you can capture a point-in-time, read-only copy of any user database, use it for reporting or off-load testing, and—if disaster strikes—revert the source back to that snapshot in minutes. This guide explains how snapshots work under the hood, walks through day-to-day tasks (including creating the original database), and highlights the pitfalls you should plan for before using them in production.

One additional tip that I’d offer: limit yourself to one database snapshot per database. There’s a performance cliff you can hit when you get fancy and try to have multiple database snapshots going at the same time off of the same database.

Comments closed

Creating Error Bars in ggplot2

Zhenguo Zhang draws a chart:

Sometimes you may want to create a plot with the following features:

  • a point to indicate the mean of a group
  • error bars to indicate the standard deviation of the group
  • and each group may have subgroups, which are represented by different colors.

In this post, I will show you how to create such a plot using the ggplot2 package in R.

Read on for the demonstration, as well as fixing a common problem of overlapping data points. H/T R-Bloggers.

Comments closed

Kafka Data Exploration with Tableflow

Robin Moffatt does some exploratory data analysis:

One of the challenges that I’d always had when it came to building streaming data pipelines is that once data is in a Kafka topic, it becomes trickier to query. Whether limited by the available tools to do this or the speed of access, querying Kafka is just not a smooth experience.

This blog post will show you a really nice way of exploring and validating data in Apache Kafka®. We’ll use Tableflow to expose the Kafka topics as Apache Iceberg™️ tables and then query them using standard SQL tools.

Click through for the demonstration using a real dataset.

Comments closed

From Power BI Premium Capacity to Fabric Capacity

Jon Vöge performs a migration:

So your old Power BI Premium Capacity has run/is running out, and your organization is acquiring a new Fabric Capacity to replace it.

Perhaps the organization even decided to take the chance to move the capacity region to something a little closer to home?

If you find yourself in this situation, how do you best migrate your contents of one Capacity to another?

Read on as Jon explains the migration process within a region (which is very easy) and the migration process if you need to go cross-region (which is rather cumbersome).

Comments closed

SQL Server Performance Office Hours

Erik Darling is answering questions again:

My company (using SQL Server 2019 Standard) has an algorithm that keys addresses into a varchar(40) has a cross-reference database that assigns an identity property to each new value, allowing us to post the numeric in our datasets. Production has to search their generated string keys in this database’s table to get the integer key in return. We have ensured proper string data typing on lookups and have unique compressed indexes on the string values. What would your next tuning step be if this was not getting the performance you needed?

There’s a good set of questions this time, so click through for Erik’s answers.

Comments closed

Temp Table Bugs in Microsoft Fabric Warehouses

Jared Westover runs into a wall:

I was excited when Microsoft announced the ability to create session-scoped temporary tables in a Fabric warehouse. However, after using Microsoft Fabric temporary tables, I quickly felt disappointed. When will they be ready for prime time, and in the meantime, what other options are available?

Click through for Jared’s experience, although it might already be fixed.

Comments closed

Executing a Fabric Data Pipeline from Azure Data Factory

Koen Verbeeck leaves the confines of Microsoft Fabric:

In the blog post Call a Fabric REST API from Azure Data Factory I explained how you can call a Fabric REST API endpoint from Azure Data Factory (or Synapse if you will). Let’s go a step further and execute a Fabric Data Pipeline from an ADF pipeline, which is a common request. A Fabric capacity cannot auto-resume, so you typically have an ADF pipeline that starts the Fabric capacity. After the capacity is started, you want to kick-off your ETL pipelines in Fabric and now you can do this from ADF as well.

Click through for the process. Though do check the warnings that Koen offers around either spending extra money by remaining in synchronous execution mode, or always getting a positive result in asynchronous execution mode, regardless of whether the underlying Fabric Data Pipeline worked or not.

Comments closed

Digging into the Kusto Detective Agency

Tom Zika becomes a gumshoe:

We need to answer this question:

Who is the detective that earned the most money in 2022?

We can see that only one table (DetectiveCases) was added in the ingestion section. Let’s take a look at its data.

The Kusto Detective Agency is a great way to apply KQL skills. I’m not sure it’s a fantastic experience for somebody with zero KQL knowledge, but if you’ve messed around at least a little bit with the language, this is a fun way of applying those skills.

Comments closed

Restoring Databases using Change Data Capture

Tim Radney restores a database:

I have recently had the privilege of working with multiple clients who have been taking advantage of Change Data Capture “CDC”. Change Data Capture is a feature that utilizes SQL Server Agent to log inserts, updates, and deletes occurring in a table. It makes these data changes available to be consumed in a relational format. These changes are captured for the modified rows and stored in change tables that mirror the column structure of the tracked source tables.

Change Data Capture is very often utilized for ETL projects or for data extracts into a data warehouse, data lake, etc.

Click through for the scenario and how you can keep CDC going after restoring a SQL Server database.

Comments closed