Press "Enter" to skip to content

Category: Constraints

Thoughts on Primary and Foreign Key Constraints

Rob Farley lays out an argument:

I am NOT suggesting that data integrity is irrelevant. Not at all. But how often do we need an enforced primary key or foreign key?

Be warned – I’m not actually going to come to a conclusion in this post. I’m going to make cases for both sides, and let you choose where you stand. The T-SQL Tuesday topic this month is Integrity, so some other people might have written on a similar topic, and produce even more compelling arguments one way or another. I’m the host this time, so you’ll be able to find the round-up (once it’s there) here on the LobsterPot Solutions site too.

I will come to a conclusion and it is that OLTP systems need primary and foreign key constraints to work properly. In the post, Rob asks a question around the last time I saw a key violation error in my application. The good(?) news is that I have plenty of them in the last application I built on SQL Server, because I need to rely on a source system that dumps data and doesn’t actually check to see if existing records were there already. That means I can’t simply perform an inner join from my table to the source table, because I could get multiple records back. No, instead, I need to use a common table expression or APPLY operator, retrieve the max values from the flotsam and jetsam that exists, and make my code harder to follow and perform worse as a result.

Distributed warehousing systems don’t have enforceable keys because of the technical challenge of enforcing keys without having different nodes talk to each other. But these things also assume either that you’ve pre-validated all of the data (like in a Kimball model), that you don’t care about duplicate records or messiness, or that you’ll fix the problem again somewhere downstream. Which, in the case of Microsoft Fabric, is typically necessary by the time you put the data into a semantic model, as those things really don’t like duplicate records and this tends to mess up relationships between tables.

Leave a Comment

Dealing with Duplicate Data via ROW_NUMBER()

Andy Brownsword removes the duplicates:

Data quality and consistency is key to the services we support and solutions we deliver. A gremlin which can undermine that is duplicate data. Let’s start the new year dealing with duplicate data and having a good clear-out.

For our example we’ll consider an Order Product table which contains an OrderID and ProductID, and the combination of these should be unique. Other fields for the duplicate records may differ so we may want to be selective about which records are removed.

This is where I get on my high horse and complain about laziness in data modeling, a very common problem. This takes nothing away from Andy’s post, which is a good method for solving a problem that has gotten out of hand. But if you know that some combination of attributes is unique, add a unique key constraint or a unique non-clustered index right then and there. Doing so will prevent improper duplicate data from ever being an issue. If you don’t know that some combination of attributes must be unique, discuss this with the business side in a way that makes sense for them. Yes, there’s always the risk that you’ll have a conversation later like, “Oh, it turns out that this really should be unique,” but in most cases, you can easily sort this kind of thing out up-front and save a lot of time and effort later on.

Leave a Comment

Database Nostalgia

Louis Davidson looks back on the past:

I will admit that the feature I will mention probably won’t be relevant to you, as it has been nearly 30 years since this feature was introduced. But since this is the most nostalgic time of the year, I want to go back, through the mists of times, to a time where we hadn’t even quite begun to fear the changing of the centuries. To a time when we a single CPU was all we had even thought of, computer screens were often colorless, a few megabytes of RAM was all that was needed. It was a simpler time, though to be fair, it was also a time before we had sports on our phones.

Louis goes on to reminisce about an implementation of database quality guarantees in a world without foreign key constraints. Which is usually about the time I rag on MySQL for its insistence that foreign keys weren’t important to the product until relatively late in its existence (version 3.23, released in 2000).

Comments closed

Unique Constraint Violations Cause Bloat in PostgreSQL

Josef Machytka shows something annoying:

The issue of table and index bloat due to failed inserts on unique constraints is well known and has been discussed in various articles across the internet. However, these discussions sometimes lack a clear, practical example with measurements to illustrate the impact. And despite the familiarity of this issue, we still frequently see this design pattern—or rather, anti-pattern—in real-world applications. Developers often rely on unique constraints to prevent duplicate values from being inserted into tables. While this approach is straightforward, versatile, and generally considered effective, in PostgreSQL, inserts that fail due to unique constraint violations unfortunately always lead to table and index bloat. And on high-traffic systems, this unnecessary bloat can significantly increase disk I/O and the frequency of autovacuum runs. In this article, we aim to highlight this problem once again and provide a straightforward example with measurements to illustrate it. We suggest simple improvement that can help mitigate this issue and reduce autovacuum workload and disk I/O.

I’ll sometimes create unique constraints in SQL Server and set the flag to ignore duplicates, most often in queue tables or situations where I know whatever gets inserted will always be the same, but that multiple agents could act to insert a record and the calling code doesn’t have any sort of protection. The idea that this pattern can hurt you in PostgreSQL seems weird to me.

Comments closed

GiST Indexes and Range Queries in PostgreSQL

Lee Asher can’t be limited to a single point:

Our Part I query used the following WHERE clause:

WHERE tsrange(o.start_time, o.end_time) && tsrange(p.enter, p.leave)

The “tsrange()” functions return timestamp ranges. But overlap queries aren’t limited to timestamps; they can be constructed from integers and floating-point values too. Imagine an arbitrage database that tracks the minimum and maximum price paid for a commodity.

Read on for examples of other types of ranges, preventing range intersection, and more.

Comments closed

Metadata-Only NOT NULL Column Insertion

Andy Brownsword has the need for speed:

When adding a new column and wanting to default the value for existing records, it used to be a painful task. As of SQL Server 2012 that became much easier.

But nobody told me, until Simon casually mentioned it in conversation recently. I had to see it for myself, so I thought I’d share for those who weren’t aware.

Read on to see how. I rarely self-promote in other people’s blog posts (hush, person who knows all the times I’ve done it), but I do have a talk on the topic of near-zero downtime database deployment strategies which includes this and quite a few other notes on what you can do without blocking others. For these sorts of changes, what you’re looking for is asynchronous processing and a Sch-M (schema modification) lock at the very end, such as when rebuilding an index with ONLINE = ON in Enterprise Edition. Alternatively, look for a Sch-M lock only on a metadata table and not the actual data. Andy’s post is an example of the latter.

Comments closed

Finding Foreign Key Constraints without Backing Indexes

Andy Brownsword goes searching:

That’s all well and good if we’re creating a new foreign key and adding our own index. How do we find existing foreign keys which aren’t yet indexed?

That’s where the query below comes into play.

We use a CTE to find foreign keys which are already indexed. From this we can then return details of other foreign keys along with scripts to create (and roll back) an index to mirror the key:

Click through for the script and notes on how it all works.

Comments closed

Data Integrity and the LIKE Operator

Joe Celko provides some context:

The LIKE operator has been in SQL since the very beginning. It’s a very weak form of pattern matching, that has only two simple wildcards. The wildcards are ‘_‘ (underscore) and ‘%‘ (percent). They respectively stand for any single character and for zero or more characters in a substring. The syntax is pretty straightforward:

Read on for examples of LIKE in action, including in check constraints.

Comments closed