2026-06-23 – Curated SQL

Spark DataFrameWriters

Published 2026-06-23 by Kevin Feasel

Miles Cole compares two generations of DataFrameWriter:

Most Spark developers learn to write data with df.write long before they ever encounter df.writeTo. It is simple, familiar, and everywhere: choose a format, pick a mode, add a few options, and save the result to a table or path. For years, that mental model worked well enough. Spark was often writing files first and tables second.

But modern lakehouse systems have changed the contract.

Read on to learn how, and what common problem the DataFrameWriterV2 is there to solve.

Comments closed

The Importance of Testing Received Wisdom

Published 2026-06-23 by Kevin Feasel

Mark Wilkinson lays out an argument:

Life is full of “absolutes”. For example, the Star Trek: The Next Generation episode “The Measure of a Man” is often cited as the best episode of the series, and many folks will tell you that you should never adjust max worker threads. But once you take the time to dig in, you realize that “Darmok” is in FACT the best episode of ST:TNG, and you’ll also find a small cohort of folks adjusting max worker threads on all of their SQL Server instances. Are these people just abnoxious contrarians? No. They just did their own testing to validate the common wisdom.

Click through for an example from Mark around 64K allocation unit sizes for NTFS volumes. And I’ll give one on max worker threads. I had a consulting client at one point which had per-customer databases. Each customer was, in general, quite small, so they had thousands of databases on the instance. They also wanted high availability on the system, so they wanted each database mirrored to a different server.

If they didn’t spike max worker threads to extreme levels, the server would have fallen over simply from the weight of all of the open database mirroring connections. The actual server workload was fine and it could handle all of the open worker threads because the large majority were doing nothing. But if a zealous problem-solver popped in, ran a diagnostic, saw that they were violating “best practices,” and “fixed” the problem, that would have been a bad day.

Unrelated but similar story: the one time they did need to fail over due to an emergency, it was also a bad day. Because even if the instances can handle 2500+ databases, it turns out that having them all fail over at the same time on low-powered Azure hardware was not a pleasant experience.

Comments closed

Optimizing Polymorphic Associations in Postgres

Published 2026-06-23 by Kevin Feasel

Andrei Lepikhov continues a thread:

Recently, I looked into how common polymorphic associations actually are in relational databases — a performance-hostile pattern built around a discriminated foreign key that ORMs (Rails, Django, Hibernate), CRM platforms (Salesforce), and 1C generate automatically. The front page of a typical online store, or the activity feed of a CRM, is built by exactly this kind of query: a base table is LEFT JOIN-ed to every possible subtype through a (type, id) pair of columns.

That earlier article answered the question ‘how widespread is this pattern?’ After all, if you’re going to improve something, it helps to know how useful the improvement will be, right? Here, I want to give a sense of how this pattern leads to performance regressions and point out directions in the PostgreSQL optimiser that could make the situation easier.

Much of this is speculative in nature but the three proposed solution ideas are all interesting.

Comments closed

Patched SQL Injection Vulnerability in sys.sp_dbmmonitorupdate

Published 2026-06-23 by Kevin Feasel

Fabiano Amorim digs into a fixed issue:

What makes this case particularly interesting is not just that the vulnerability exists in a trusted system object, but how it works: the injection bypasses a REPLACE-based sanitization attempt through a subtle Unicode character conversion that happens silently during a variable assignment.

The vulnerability was reported to Microsoft and they have since fixed it, but it’s still worth exposing and explaining given how intricate it is. So, that’s what I’ll do in this article.

Click through to see how it works. And of course this database mirroring stored procedure is still hanging around long after database mirroring itself was deprecated. But that’s the downside to deprecation without subsequent removal.

Comments closed

Inserting More than a Thousand Rows at a Time via VALUES()

Published 2026-06-23 by Kevin Feasel

Andy Brownsword has a solution:

INSERT statement exceeds the maximum allowed number of 1000 row values

Folks may tell you there are better ways to solve the issue, but the likelihood is that you’re doing it this way for a particular reason. So let’s skip the spiel and jump straight to a simple solution.

Click through to see what you can do to get around that issue.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Day: June 23, 2026

Spark DataFrameWriters

The Importance of Testing Received Wisdom

Optimizing Polymorphic Associations in Postgres

Patched SQL Injection Vulnerability in sys.sp_dbmmonitorupdate

Inserting More than a Thousand Rows at a Time via VALUES()