Press "Enter" to skip to content

Day: November 19, 2025

Sales Tables and Normalization

Hans-Jürgen Schönig lays out an argument:

When creating data models in PostgreSQL (or any other powerful relational database), several common problems can arise that hinder effective database design. One frequent issue is the incorrect normalization of tables, leading to redundant data, inefficient querying capabilities, or even simply plain wrong semantics. 

First, I agree with the argument Hans-Jürgen is making regarding modeling sales data: it is important to understand what information the company is actually collecting and the domain and business rules associated with that data collection. The real world should shape and define your database model, not the other way around.

But where I fervently disagree is that this is “too much” or “over-optimistic” normalization because this is not normalization. Normalization is the lossless decomposition of data across entities, and normal forms 1-5 break down along two major lines: decompose functional dependencies (Boyce-Codd Normal Form), and decompose join dependencies (4th & 5th normal forms). In the example, there is clearly a loss of information from the wider model—the table definition Hans-Jürgen shows later on—so the split into product + sales tables does not adhere to 5th normal form.

The sales table does need some variant of product_unit_price because that was the price as of the time of sale, and prices can change for a variety of reasons: discounts because the product is on sale, adjustments due to inflation, increases in demand, supply chain issues, adjustments based on new marketing models, etc. For historical purposes, it is important that we capture the price as of the time of sale, as that will tell how much to refund if the customer returns with the product. It is also important for bookkeeping reasons, as a variety of taxes and reports depend upon that number.

But it’s also clear that the product table is reasonable: a product number (following the first rule of database architecture: anything labeled “number” is actually a string with non-numeric characters in it), a product name, and a product price. But this is the current sales price, which is different from the price at the time of sale. Those are two separate attributes and labeling them both “product_unit_price” leads to some of that confusion. They should both be in the data model.

Then, if I’m looking critically at the data model, I would ask why product name is in the sales data. Do product names change so frequently that we need to capture it on every sales record? Do managers have the authority to change the name of a product on the fly during a sale? Probably not and unless managers do have that discretion or the product name changes on a per-sale basis, it doesn’t belong there. There could be a functional dependency from product_number to product_name, such that each unique value of product_name has one and only one product_number. In that case, the product table should have product_name and the sales table should not.

But what if the product name does change over time and we need to track those changes? Then I’d (potentially) relabel product_name on the product table to current_product_name and create a 6th normal form table called something like product_name_history. This table would include the product_number, the product_name, and a time range over which that name was valid for that product. These timeframes should not overlap and, as long as the product exists, there should be an entry that corresponds with the appropriate name at any arbitrary time.

My final model would look something like:

CREATE TABLE product (
    product_number text PRIMARY KEY,
    product_name text NOT NULL,
    product_price numeric(10, 2) NOT NULL
);

CREATE TABLE product_name_history (
    -- Primary key would be on the combo of product_number and begin_time/interval
    product_number text NOT NULL,
    product_name text NOT NULL,
    -- Architected as an interval, implemented as start and end timestamps
    begin_time timestamptz NOT NULL,
    end_time timestamptz NOT NULL
);

CREATE TABLE sales (
    sales_time timestamptz NOT NULL,
    product_number text NOT NULL,
    sale_unit_price numeric(10, 2) NOT NULL,
    sale_units numeric(10, 2) NOT NULL
);

I did also rename “product_unit_price” and “product_units” to “sale_unit_price” and “sale_units” to make it clearer that these were associated with the sale itself. We could also get into things like sales orders versus sale order line items, additional attributes, etc. but this post is already long enough as it is. And again, this assumes that product name changes frequently enough and the different names are relevant enough for us to care about what the product name was at the time of a given sale. If not, we could remove the product_name_history table without otherwise changing the model.

Long story short, I agree with Hans-Jürgen that database design is important and understanding the business domain is critical for designing a good data model. But where I disagree is with the concept of over-normalization. That’s not a real thing. Improper attempts at normalization are certainly a real thing, as we see in this post, and it is important to understand the business rules as well as the rules of defining normal forms in order to prevent this.

Leave a Comment

Running SQL Server 2025 and SSMS in MacOS Tahoe

Randolph West shares some notes:

We Apple product folks have been through a journey with the Microsoft data platform and Arm64 support, haven’t we? When the M1-based Apple silicon laptops were released, we couldn’t run Windows x64 in a virtual machine at all. Then Parallels managed to get a build of Windows 11 Arm64 going, but that meant we couldn’t run SQL Server … reliably, anyway. Some versions could install sometimes, but it was unpredictable.

It looks like neither product runs perfectly, but they both work better than I expected.

Leave a Comment

Microsoft Fabric November 2025 Feature Summary

Adam Saxton has a long list:

The November 2025 Fabric release introduces several major updates, including the general availability of SQL database, Cosmos DB, and enhanced mirroring support for key data sources such as SQL Server, Cosmos DB, and PostgreSQL.

This month also brings new AI-driven features like Copilot sidecar chat tools and real-time data exploration, as well as crucial platform enhancements such as Azure DevOps cross-tenant support, improved security permissions in OneLake, and expanded connectivity through new connectors and developer tooling. These updates are designed to empower users with greater flexibility, intelligence, and control across the Fabric platform.

When the table of contents is roughly three pages, you know it’s either Ignite or Build.

Leave a Comment

SQL Database in Fabric now GA

Anna Hoffman and Idris Motiwala make an announcement:

SQL is everywhere and Microsoft is innovating to deliver a unified experience across on-premises, cloud, and SaaS. One SQL unifies your data estate, bringing platform consistency, performance at scale, advanced security, and AI-ready tools together in one seamless experience, and SQL database in Fabric is no exception to that. At Microsoft Ignite, we’re thrilled to announce SQL database in Microsoft Fabric is officially Generally Available!

This is definitely a fluffy post, though Anna does have some linked videos that go into more detail.

Leave a Comment

SQL Server 2025 Now GA

A quick roundup on some big news from Microsoft Ignite.

John Deardurff hits the marketing materials:

The wait is over! SQL Server 2025 is officially here, and it’s more than just another version. I’ve been working with it during the last six months it was in public preview, and some of the features announced still surprised me. Let’s break down what makes this release so exciting and why it’s a game-changer for businesses and developers alike.

Andy Yun is pleased that Standard Edition just got a lot more powerful:

While all of the new features and such were available during Public Preview, one thing that was not public until now is that Standard Edition limits have been increased! Yay! The CPU core count limit is now 32 cores (or 4 sockets, whichever is lesser) and the max buffer pool memory per instance is now 256GB! Additionally, Resource Governor is now available for Standard Edition. And in SQL Server 2025, Resource Governor can also help you manage TempDB!

Reitse Eskens goes into several useful features, including Express Edition love:

You may have seen earlier blogs from me on SQL Server 2025; I won’t repeat that content in full, but I’ll provide a brief summary. You can find the full blogs here.

Something I wasn’t able to mention before, but you can now find in the documentation, is that SQL Server Express has upgraded its game! No longer a limit of 10 GB for the database, but 50 GB. This makes it much more useful in many scenarios.

Brent Ozar lays out a bunch of features as well:

If you use columnstore indexes, 2025 continues upon every release’s investments, with a bunch of improvements to make management easier on ordered indexes. In my early tests with clients, we’ve seen massive improvements in easier, faster, more online maintenance that we simply couldn’t do before.

Leave a Comment