2025-10-31 – Curated SQL

When repartition() Beats coalesce() in Spark

Published 2025-10-31 by Kevin Feasel

Janani Annur Thiruvengadam stands some common advice on its head:

If you’ve worked with Apache Spark, you’ve probably heard the conventional wisdom: “Use coalesce() instead of repartition() when reducing partitions — it’s faster because it avoids a shuffle.” This advice appears in documentation, blog posts, and is repeated across Stack Overflow threads. But what if I told you this isn’t always true?

In a recent production workload, I discovered that using repartition() instead of coalesce() resulted in a 33% performance improvement (16 minutes vs. 23 minutes) when writing data to fewer partitions. This counterintuitive result reveals an important lesson about Spark’s Catalyst optimizer that every Spark developer should understand.

Read on for the details on that scenario.

Comments closed

Dealing with Dirty Pages in PostgreSQL

Published 2025-10-31 by Kevin Feasel

Umair Shahid explains what dirty pages are in PostgreSQL:

PostgreSQL stores data in fixed‑size blocks (pages), normally 8 KB. When a client updates or inserts data, PostgreSQL does not immediately write those changes to disk. Instead, it loads the affected page into shared memory (shared buffers), makes the modification there, and marks the page as dirty. A “dirty page” means the version of that page in memory is newer than the on‑disk copy.

Before returning control to the client, PostgreSQL records the change in the Write‑Ahead Log (WAL), ensuring durability even if the database crashes. However, the actual table file isn’t updated until a checkpoint or background writer flushes the dirty page to disk.

The concept is the same in SQL Server. Read on to see how PostgreSQL manages dirty pages and some of the issues you might run into with them.

Comments closed

Tips for Running SQL Server on Hyper-V

Published 2025-10-31 by Kevin Feasel

Mike Walsh shares some advice:

If you’ve ever asked yourself, “Why does my SQL Server seem slower on Hyper-V than it should be?!”, this post might help. And if you asked me about Hyper-V ten years ago, I’d probably have laughed. Maybe even less than that. But here’s the thing: it scales, it works, and with the Broadcom/VMware “fun” squeezing the world for profit, we’re seeing more and more folks making the move to Hyper-V. You have to pay attention to some key configurations, just as with VMware..

None of this is radical advice, but it is good to make sure that you have the basics covered because these can make a difference.

Comments closed

Migrating from PSCore 6 to PowerShell 7.5

Published 2025-10-31 by Kevin Feasel

Adam Bertram lays out what has changed:

You adopted PowerShell Core 6 early. You moved scripts to .NET Core. You dealt with the compatibility issues. Now Microsoft wants you to upgrade again.

Here’s why it matters: PowerShell Core 6 is no longer supported. Your scripts still run, but you’re missing security patches, performance improvements, and features that make PowerShell 7.5 worth the upgrade.

The good news? Moving from Core 6 to 7.5 is easier than the jump from Windows PowerShell. Most scripts work unchanged. But “most” isn’t “all,” and the differences matter.

Read on to see what will break when moving from PowerShell Core 6 to PowerShell 7.5.

Comments closed

Generating a DAXX File for Performance Tuning

Published 2025-10-31 by Kevin Feasel

Phil Seamark does some troubleshooting:

When troubleshooting slow DAX queries, sharing the right diagnostic information with an expert can make all the difference. That’s where a DAXX file comes in. This special file format is created using DAX Studio. It bundles essential metadata and performance details without exposing query results. It’s perfect for collaborative optimisation.

Read on to learn more about what a DAXX file is and how it can be useful in the performance tuning process.

Comments closed

Random Number Generation in T-SQL via Marsaglia Polar Method

Published 2025-10-31 by Kevin Feasel

Sebastiao Pereira implements a method for generating random numbers in T-SQL:

Generating random numbers from a normal distribution is essential for accuracy and realistic modeling, simulation, inference, and algorithm design for scientific, engineering, statistical, and AI domains. How can we build a random number generator using Marsaglia Polar method in SQL Server without the use of external tools?

It’s an interesting technique that works well for drawing points from a two-dimensional space around a point.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: October 31, 2025

When repartition() Beats coalesce() in Spark

Dealing with Dirty Pages in PostgreSQL

Tips for Running SQL Server on Hyper-V

Migrating from PSCore 6 to PowerShell 7.5

Generating a DAXX File for Performance Tuning

Random Number Generation in T-SQL via Marsaglia Polar Method