Normalization – Curated SQL

Multi-Measure Calculations in Relational Databases

Published 2025-03-05 by Kevin Feasel

Greg Low describes a common business problem:

But while food wholesale systems will need to deal with quantities like I described in that post, they often have another layer of complexity. Items are often sold by:

Quantity

Weight

Quantity and Weight

This is an interesting look at how the domain can drive what a proper solution looks like. It also seems like a good use case for 6th normal form, with unit quantity and unit weight tables to prevent NULL from cropping up.

Comments closed

A Mistake of “Normalization”

Published 2025-03-04 by Kevin Feasel

Hans-Jürgen Schönig makes an argument:

The concept of “normalization” is often the first thing people who are new to databases are going to learn. We are talking about one of the fundamental principles in the realm of databases. But what is the use of normalization in the first place? Well, we want to avoid redundancies in the data and make sure that information is stored in a way that helps reduce mistakes and inconsistencies. Ultimately, that is all there is to it: No redundancies, no mistakes, no inconsistencies.

There’s an example in this of “too much normalization” but I’m going to push back because this is a common misunderstanding of the idea of normalization.

The example covers removing price from an invoice table and having people look up the price from the product table, as having each price in an invoice is duplication, and we’re trying to eliminate duplication.

This argument is wrong, because it conflates two concepts. The listing price of an item is its current price. This is the thing you will see on a products table. The sale price of an item on the invoice table is a historical artifact and is not the same as the listing price, even if the dollar amounts match. Hans-Jürgen points out the consequence of making this mistake, and is correct in pointing this out. But it’s not “too much normalization” because it misunderstands the domain model and eliminating sale price from a table would remove information. Properly following the rules of normalization means you cannot lose information–that’s what each one of the normal forms does. In this case, we remove an attribute based on a faulty assumption that there is a functional dependency between product ID and sale price (that is, every time you see a specific product ID, you will always see a specific sale price). That’s the crux of the issue in this example, but the concept of normalization takes strays as a result of the faulty assumed functional dependency.

Comments closed

Brownfield Data Modeling

Published 2025-02-05 by Kevin Feasel

Jared Westover discusses a common trade-off:

Some decisions in life are easy, like whether to drink that second cup of coffee. But when it comes to databases, things get complicated fast. Developers often seek my input on adding tables and columns. A common question arises: Should they create a new table or expand an existing one by adding columns? This decision can be tricky because it depends on several factors, including query performance, future growth, and the complexity of implementing either solution. While adding one or two columns to an existing table may seem the easiest option, is it the best long-term solution? In this article, we look at whether it is better to add new columns versus a new table in SQL Server.

As an architectural pro-tip, when you’re looking to add a new column to an existing table, ask yourself if the new attribute you want to add actually relates to the natural key of the existing table. In Jared’s example, the natural key for video game tracker is presumably video game ID (which itself ties back to, presumably, the video game title, developer, console, and release date) and start date. Does a book actually relate to a video game and start date? No, it does not. Therefore, this book attribute does not belong on the video game tracker table.

When you dig deeper into Boyce-Codd Normal Form, you figure out that “relates to” in the prior paragraph translates to “has a functional dependency upon,” but using non-technical language for people not familiar with normalization, you can still get to the same conclusion, because ultimately, 95% of database normalization is common sense that we strenuously apply to a business domain.

And most of the time, the developer knows that this feels weird, but doesn’t want to spend the extra time doing it the best way and instead tries to do it the expedient way. This is where the role of the architect as politician comes in, and we gently guide people to the right conclusion. Or just tell them to put on their big boy britches and do it right. Either way.

Comments closed

An Overview of Normal Forms

Published 2024-07-15 by Kevin Feasel

Daniel Calbimonte talks normalization:

Various levels of normalization in SQL can be used to reduce data redundancy and have a better-structured relational data model. This tutorial looks at these various levels with explanations and examples in Microsoft SQL Server for beginners.

I disagree with part of Daniel’s explanation of 1NF: I believe that the idea of atomicity, as Daniel defines it, is not part of 1NF. I’m basing this off of CJ Date’s definition of first normal form:

Given relvar R with heading H containing attributes A1…An of types T1…Tn, all tuples follow heading H and have one value of type Ti for attribute Ai.

All this says is that we have a single value per attribute in a tuple. “LeBron James, Lakers” and “Stephen Curry, Warriors” are perfectly reasonable values for attributes in first normal form. In Database Design and Relational Theory, Date spends a few pages covering the idea of atomicity and how there’s no good explanation for what, exactly, “atomic” means. Even in Daniel’s example, you could break down player and coach names further, not only into first and last names, but also subsets of characters within those names, like syllables. The closest thing I have for atomicity is the idea that something is atomic when it is at the lowest level given a particular set of data requirements. But that’s not a mathematical rule like the rules of normalization. It’s a business rule, and necessarily fuzzier and subjective.

That said, I like the rest of Daniel’s list and appreciate going to 5th normal form.

1 Comment

Surrogate Keys and Logical Data Models

Published 2023-11-22 by Kevin Feasel

I wrap up a series on database normalization:

This video serves as a coda, covering one topic I did not include in the main series: do surrogate keys belong in the logical data model?

The short answer is, “no.” The longer answer is, “no and here’s why.”

Comments closed

Database Normalization: Abnormal Forms

Published 2023-11-16 by Kevin Feasel

I draw the logical conclusion: the opposite of normal forms is, of course, abnormal forms:

This video covers a variety of topics, effectively wrapping up the series on normalization. We look at data warehousing, including why the Kimball-style star schema is a really bad design in theory but a perfectly reasonably design in practice. We cover the chimera of “overnormalization” and I throw out a hot take. And we finally slag on denormalization.

Click through for the video.

Comments closed

The Utility of 6th Normal Form

Published 2023-11-08 by Kevin Feasel

I have a new video:

In this video, explain what Sixth Normal Form (6NF) is and why it slots in as the third most-important normal form. We look at two separate use cases in which 6NF can make sense and I provide some guidance on when 5NF is good enough versus when 6NF is better.

6th Normal Form doesn’t necessarily make sense all the time, but there are some really good use cases for it.

Comments closed

An Overview of 4th Normal Form

Published 2023-11-01 by Kevin Feasel

I continue a series on database normalization:

In this video, [I] explain what Fourth Normal Form (4NF) is and why I consider 5NF to be significantly more important. Even so, 4NF does make it easy to explain a certain common class of problem, allowing it to provide some measure of utility.

4th Normal Form is a special case of the much more exciting 5th Normal Form, but I do have a bit of a soft spot for it.

Comments closed

Embrace the Power of 5th Normal Form

Published 2023-10-25 by Kevin Feasel

I have a new video up:

In this video, we drill into the other most important normal form, learning what Fifth Normal Form (5NF) is, why Boyce-Codd Normal Form is not enough, and examples of why 5NF can be such a challenge to implement.

Until I read CJ Date’s Database Design and Relational Theory (2nd edition), my level of appreciation for 5th Normal Form was somewhat limited, but that’s mostly because I didn’t understand it well at all. I liked the connection trap example in this article, but Date’s book was the first really good explanation of 5NF and just how powerful it is. My hope is that I was successfully able to convey that power to audiences.

Comments closed

A Primer on Boyce-Codd Normal Form

Published 2023-10-18 by Kevin Feasel

I have a new video:

In this video, we drill into one of the two most important normal forms, learning what Boyce-Codd Normal Form (BCNF) is, how you can get to BCNF, and a practical example of it. We also learn why I cast so much shade on 2nd and 3rd Normal Forms.

Boyce-Codd Normal Form is one of the two most important normal forms, and I’m pretty happy with the way this video came together to explain how you can get from 1NF into BCNF, as well as the specific benefits this provides.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Normalization