Press "Enter" to skip to content

Author: Kevin Feasel

A Hidden Slowdown in PostgreSQL Restores

Warda Bibi explains why a large PostgreSQL database restoration can take a long time:

In July 2025, during the PG19-1 CommitFest, I reviewed a patch targeting the lack of parallelism when adding foreign keys in pg_restore. Around the same time, I was helping a client with a large production migration where pg_restore dragged on for more than 24 hours and crashed multiple times.

In this blog, I will talk about the technical limitations in PostgreSQL, the proposed fix, and a practical workaround for surviving large restores.

The reason for the slowdown is a bit wild to me, though I suppose somebody could edit the backup files manually after pg_dump writes them out.

Leave a Comment

Worst-Case Testing for Direct Lake Semantic Models

Chris Webb updates a prior post:

Two years ago I wrote a detailed post on how to do performance testing for Direct Lake semantic models. In that post I talked about how important it is to run worst-case scenario tests to see how your model performs when there is no model data present in memory, and how it was possible to clear all the data held in memory by doing a full refresh of the semantic model. Recently, however, a long-awaited performance improvement for Direct Lake has been released which means a full semantic model refresh may no longer page all data out of memory – which is great, but which also makes running performance tests a bit more complicated.

Read on to learn more about the improvement as well as how you can still perform your performance testing.

Leave a Comment

Choosing between Random Forest and Gradient Boosting

Jayita Gulati compares two popular algorithms for classification:

When working with machine learning on structured data, two algorithms often rise to the top of the shortlist: random forests and gradient boosting. Both are ensemble methods built on decision trees, but they take very different approaches to improving model accuracy. Random forests emphasize diversity by training many trees in parallel and averaging their results, while gradient boosting builds trees sequentially, each one correcting the mistakes of the last.

This article explains how each method works, their key differences, and how to decide which one best fits your project.

Click through for the explanation.

Leave a Comment

SQL Injection Vulnerabilities Fixed in SQL Server

Vlad Drumea tests out a pair of fixes:

In this post I demo two PoCs for SQL injection vulnerabilities fixed in SQL Server 2025 CU20 GDR KB5063814.

This August’s Patch Tuesday came with a security patch for SQL Server 2022, 2019, 2017, and 2016.

The number of SQL injection vulnerabilities caught my attention and I decided to see what system stored procedures have changed to see if I can find anything useful.

Vlad looks at a pair of spatial stored procedures and puts together a method to exploit the old versions.

Leave a Comment

Date Intervals in PostgreSQL Window Functions

Hubert Lubaczewski solves a problem:

Since I can’t copy paste the text, I’ll try to write what I remember:

Given table sessions, with columns: user_id, login_time, and country_id, list all cases where single account logged to the system from more than one country within 2 hour time frame.

The idea behind is that it would be a tool to find hacked account, based on idea that you generally can’t change country within 2 hours. Which is somewhat true.

Solution in the blogpost suggested joining sessions table with itself, using some inequality condition. I think we can do better…

Click through for a solution that works for PostgreSQL but not SQL Server because the latter doesn’t offer date and time intervals on window function frames.

To do this in SQL Server, I’d probably use LAG() and get the prior value of country ID and the prior login time. Something like the following query, though I didn’t run detailed performance checks.

WITH records AS
(
	SELECT
		s.user_id,
		s.login_time,
		s.country_id,
		LAG(s.login_time) OVER (PARTITION BY s.user_id ORDER BY s.login_time) AS prior_login_time,
		LAG(s.country_id) OVER (PARTITION BY s.user_id ORDER BY s.login_time) AS prior_country_id
	FROM sessions s
)
SELECT *
FROM records r
WHERE
	r.prior_country_id <> r.country_id
	AND DATEDIFF(HOUR, r.prior_login_time, r.login_time) <= 2;
Leave a Comment

Trying Time-Bound Extended Events in SQL Server 2025 RC0

Reitse Eskens check the state of the art:

What’s new in SQL 2025 is that the team behind SQL Server 2025 has addressed an issue that I’ve encountered once or twice as well. If you start an extended event, it will run until you stop it. Yes, you have to manually stop it. If you don’t, it will stop eventually. Together with some other services, because the disk space has run out. In my case, I was fortunate that the servers were monitored and alerts were triggered due to low disk space.

In this new version, starting with Release Candidate 0 (RC0), you can configure the duration.

Click through for a demo, as well as current limitations (generally around this not being in the UI yet).

Leave a Comment

Generating Calendar Dates in T-SQL

Rick Dobson writes some code:

Sometimes it is convenient to generate calendar dates, such as for Fridays across multiple years. This tip demonstrates two different solutions for this kind of task. The demonstrations implement recursive CTEs, chained CTEs, as well as a stored procedure with both input and output parameters. These two different demonstrations are described so that you can incorporate either or both into your own calendar date generation projects.

Honestly, this is where a good calendar table (or date dimension) would be extremely handy.

Leave a Comment

Time Series Forecasting in Python

Myles Mitchell builds an ARIMA model:

In time series analysis we are interested in sequential data made up of a series of observations taken at regular intervals. Examples include:

  • Weekly hospital occupancy
  • Monthly sales figures
  • Annual global temperature

In many cases we want to use the observations up to the present day to predict (or forecast) the next N time points. For example, a hospital could reduce running costs if an appropriate number of beds are provisioned.

Read on for a primer on the topic, a quick explanation of ARIMA, and a sample implementation using several Python packages.

Leave a Comment

Architectural Guidance for IoT Deployments in Azure

Bhimraj Ghadge shares some tips:

Edge computing, a strategy for computing on location where data is collected or used, allows IoT data to be gathered and processed at the edge, rather than sending the data back to a data center or cloud. Together, IoT and edge computing are a powerful way to rapidly analyze data in real-time.

In this Tutorial, I am trying to lay out the components and considerations for designing IoT solutions based on Azure IoT and services.

Read on for an overview of IoT components in Azure, as well as several things to keep in mind during systems design and implementation.

Leave a Comment