Category: Constraints

Building State Transitions as SQL Constraints

Published 2025-06-24 by Kevin Feasel

About two decades ago, I introduced the concept of transition constraints to show Data Validation in a database is a lot more complex than seeing if a string parameter really is an integer. In October of 2008, I did an article called Constraint Yourself! on how to use DDL constraints to assure data integrity. One of the topics in that piece was a look at state transition constraints via an auxiliary table.

Read on for an interesting dive into the topic.

Comments closed

Domains in ANSI SQL

Published 2025-03-27 by Kevin Feasel

Joe Celko describes a domain:

For example, if there though is that there is a domain called voltage which has a base unit called “volt” that’s otherwise meaningless. Yes, you can get a voltmeter you can watch the needle, you can be told what the IEEE specification for defining how much work a volt should do or shock you. I’ve discussed scales and types of measurements in a previous article, It’s worth mentioning that you should not confuse domain with the representation and symbols of the units being used. Some domains are limited, such as degrees that measure planar angles. An angle can be from 0 to 360°, or it can be between zero and 2π radians.

Joe has an explanation but doesn’t have any concrete examples in psql. Here’s one from the PostgreSQL documentation:

CREATE DOMAIN us_postal_code AS TEXT
CHECK(
   VALUE ~ '^\d{5}$'
OR VALUE ~ '^\d{5}-\d{4}$'
);

The idea of a domain here is that you define a valid slice of some data type. We can do something similar with check constraints on an attribute, but the difference is that we’d need to create the check constraint for each relevant attribute, whereas the domain would include this check automatically, making it quite useful if we have multiple instances of, say, us_postal_code in our database. Then, we wouldn’t need to worry about creating a check constraint on each instance and ensuring that the code remains the same across the board.

This also leads to a very common sentiment in functional programming: make invalid states unrepresentable. In other words, make it impossible for a person or piece of code to generate a result in an invalid state. By defining a domain with the scope of our valid state, we make it impossible for someone to create a US postal code value that does not pass our check, and so we can’t have dirty data of this sort in our database.

Comments closed

Truncating All Tables while Preserving Foreign Keys in T-SQL

Published 2025-01-28 by Kevin Feasel

Ronald Kraijesteijn builds a script:

When testing a data warehouse, a common challenge is managing large datasets effectively. Often, you need to reset tables to a clean state, ensuring consistent testing environments. The most efficient way to clear a table is using the SQL command TRUNCATE TABLE. However, this command is not straightforward when foreign key constraints are present. In this article, we’ll explore a solution that temporarily disables constraints, allows truncation, and then restores the constraints—keeping your data model intact.

Click through for the script, which saves a record of all of the foreign key constraints, truncates each table, and then re-creates the foreign keys.

Comments closed

Thoughts on Primary and Foreign Key Constraints

Published 2025-01-14 by Kevin Feasel

Rob Farley lays out an argument:

I am NOT suggesting that data integrity is irrelevant. Not at all. But how often do we need an enforced primary key or foreign key?

Be warned – I’m not actually going to come to a conclusion in this post. I’m going to make cases for both sides, and let you choose where you stand. The T-SQL Tuesday topic this month is Integrity, so some other people might have written on a similar topic, and produce even more compelling arguments one way or another. I’m the host this time, so you’ll be able to find the round-up (once it’s there) here on the LobsterPot Solutions site too.

I will come to a conclusion and it is that OLTP systems need primary and foreign key constraints to work properly. In the post, Rob asks a question around the last time I saw a key violation error in my application. The good(?) news is that I have plenty of them in the last application I built on SQL Server, because I need to rely on a source system that dumps data and doesn’t actually check to see if existing records were there already. That means I can’t simply perform an inner join from my table to the source table, because I could get multiple records back. No, instead, I need to use a common table expression or APPLY operator, retrieve the max values from the flotsam and jetsam that exists, and make my code harder to follow and perform worse as a result.

Distributed warehousing systems don’t have enforceable keys because of the technical challenge of enforcing keys without having different nodes talk to each other. But these things also assume either that you’ve pre-validated all of the data (like in a Kimball model), that you don’t care about duplicate records or messiness, or that you’ll fix the problem again somewhere downstream. Which, in the case of Microsoft Fabric, is typically necessary by the time you put the data into a semantic model, as those things really don’t like duplicate records and this tends to mess up relationships between tables.

Comments closed

Dealing with Duplicate Data via ROW_NUMBER()

Published 2025-01-10 by Kevin Feasel

Andy Brownsword removes the duplicates:

Data quality and consistency is key to the services we support and solutions we deliver. A gremlin which can undermine that is duplicate data. Let’s start the new year dealing with duplicate data and having a good clear-out.

For our example we’ll consider an Order Product table which contains an OrderID and ProductID, and the combination of these should be unique. Other fields for the duplicate records may differ so we may want to be selective about which records are removed.

This is where I get on my high horse and complain about laziness in data modeling, a very common problem. This takes nothing away from Andy’s post, which is a good method for solving a problem that has gotten out of hand. But if you know that some combination of attributes is unique, add a unique key constraint or a unique non-clustered index right then and there. Doing so will prevent improper duplicate data from ever being an issue. If you don’t know that some combination of attributes must be unique, discuss this with the business side in a way that makes sense for them. Yes, there’s always the risk that you’ll have a conversation later like, “Oh, it turns out that this really should be unique,” but in most cases, you can easily sort this kind of thing out up-front and save a lot of time and effort later on.

Comments closed

Database Nostalgia

Published 2024-12-12 by Kevin Feasel

Louis Davidson looks back on the past:

I will admit that the feature I will mention probably won’t be relevant to you, as it has been nearly 30 years since this feature was introduced. But since this is the most nostalgic time of the year, I want to go back, through the mists of times, to a time where we hadn’t even quite begun to fear the changing of the centuries. To a time when we a single CPU was all we had even thought of, computer screens were often colorless, a few megabytes of RAM was all that was needed. It was a simpler time, though to be fair, it was also a time before we had sports on our phones.

Louis goes on to reminisce about an implementation of database quality guarantees in a world without foreign key constraints. Which is usually about the time I rag on MySQL for its insistence that foreign keys weren’t important to the product until relatively late in its existence (version 3.23, released in 2000).

Comments closed

Unique Constraint Violations Cause Bloat in PostgreSQL

Published 2024-12-05 by Kevin Feasel

Josef Machytka shows something annoying:

The issue of table and index bloat due to failed inserts on unique constraints is well known and has been discussed in various articles across the internet. However, these discussions sometimes lack a clear, practical example with measurements to illustrate the impact. And despite the familiarity of this issue, we still frequently see this design pattern—or rather, anti-pattern—in real-world applications. Developers often rely on unique constraints to prevent duplicate values from being inserted into tables. While this approach is straightforward, versatile, and generally considered effective, in PostgreSQL, inserts that fail due to unique constraint violations unfortunately always lead to table and index bloat. And on high-traffic systems, this unnecessary bloat can significantly increase disk I/O and the frequency of autovacuum runs. In this article, we aim to highlight this problem once again and provide a straightforward example with measurements to illustrate it. We suggest simple improvement that can help mitigate this issue and reduce autovacuum workload and disk I/O.

I’ll sometimes create unique constraints in SQL Server and set the flag to ignore duplicates, most often in queue tables or situations where I know whatever gets inserted will always be the same, but that multiple agents could act to insert a record and the calling code doesn’t have any sort of protection. The idea that this pattern can hurt you in PostgreSQL seems weird to me.

Comments closed

GiST Indexes and Range Queries in PostgreSQL

Published 2024-12-03 by Kevin Feasel

Lee Asher can’t be limited to a single point:

Our Part I query used the following WHERE clause:

WHERE tsrange(o.start_time, o.end_time) && tsrange(p.enter, p.leave)

The “tsrange()” functions return timestamp ranges. But overlap queries aren’t limited to timestamps; they can be constructed from integers and floating-point values too. Imagine an arbitrage database that tracks the minimum and maximum price paid for a commodity.

Read on for examples of other types of ranges, preventing range intersection, and more.

Comments closed

Metadata-Only NOT NULL Column Insertion

Published 2024-11-20 by Kevin Feasel

Andy Brownsword has the need for speed:

When adding a new column and wanting to default the value for existing records, it used to be a painful task. As of SQL Server 2012 that became much easier.

But nobody told me, until Simon casually mentioned it in conversation recently. I had to see it for myself, so I thought I’d share for those who weren’t aware.

Read on to see how. I rarely self-promote in other people’s blog posts (hush, person who knows all the times I’ve done it), but I do have a talk on the topic of near-zero downtime database deployment strategies which includes this and quite a few other notes on what you can do without blocking others. For these sorts of changes, what you’re looking for is asynchronous processing and a Sch-M (schema modification) lock at the very end, such as when rebuilding an index with ONLINE = ON in Enterprise Edition. Alternatively, look for a Sch-M lock only on a metadata table and not the actual data. Andy’s post is an example of the latter.

Comments closed

Benefits of Indexing Foreign Key Columns

Published 2024-11-07 by Kevin Feasel

Andy Brownsword creates an index:

Last week I shared a script to identify foreign keys which weren’t indexed. What we didn’t discuss in detail was why it can be important to index these columns.

Here we’ll demonstrate the two areas we can see great benefits by introducing the indexes.

Read on to learn more about these two scenarios.

Comments closed