By large databases I’m roughly meaning databases with several hundred tables, and I usually see a lot of these tables with several hundred GB’s of data in them.
When I generally ask about the reason for no foreign key, I’m told
- they add overhead
- they give no benefit
- we can’t enter our data properly when we have them
The last one in the above list is generally down to poor modelling – an example being entering a later part of a financial transaction before the first part of the financial transaction has taken place. Once both parts of the financial transaction have taken place then the database is left in a consistent state – BUT, that generally being handled by the app NOT the database – OUCH!
There are times where key constraints are too much—often-updating fact tables might be one such scenario. And some of “too much” comes down to hardware quality. But for the most part, those key constraints are one of the clearest forms of database documentation available, not to mention their normal benefits.
One of the things I think is important in modeling your particular entity is including a primary key (PK). In my DevOps talk I stress this, as I’d rather most attendees come away thinking a PK is important as their first takeaway from the session. There are exceptions, but they are rare, and I would prefer that most tables just have some PK included from the beginning.
A PK ought to be stable as well, and there are plenty of written words about how to pick the PK for your particular problem domain. Often I have received the advice that natural keys are preferred over surrogate keys, and it is worth the effort to try and identify a suitable column (or set of columns) that will guarantee uniqueness. I think that’s good advice, and it’s also advice I tend to ignore.
Read on for Steve’s reasoning. I tend to use surrogate keys out of habit, though I do prefer to put unique key constraints on natural keys to help me reason through data models.
We want to spend our SQL Server licensing dollars wisely, so why ask it to do unnecessary work? More CPU time, more IO, and our client is waiting longer for the process to complete (I don’t like making anyone wait).
There’s a second “gotcha” with the
AFTER INSERTmethod that applies to only some use cases. Let’s say you’re loading some historical data into the table, and that data includes the
LastModifieddate. Or maybe your application has a very specific date that should be inserted into the table for this field.
Andy makes good points.
You should be able to create a #temp in every session. That’s the idea, right? It’s one of the things that differentiates a global temp table from a local temp table. But there can be some difficulties with that.
If you are working with reusable code that uses temp tables (a stored procedure for example), sometimes you need to create a constraint. The thing about constraints is that their names are just as unique as tables, stored procedures etc. i.e. the name of a constraint can only be used once. You can’t have two tables with the same constraint name. In fact, you can’t even have a constraint name that matches a table, stored procedure etc name.
There’s some solid advice in this post.
You will see the code is not the same as when I coded it:
([Value1] IS NOT NULL AND [Value2] IS NULL OR [Value1] IS NULL AND[Value2] IS NOT NULL OR NOT [Value1]=[Value2] AND [Value2]>(2)*[Value1])
Looking as this, one thing stands out to the eye: the parenthesis have been removed from the comparisons, but added to the literal value of 2. (Also, everything is surrounded by square brackets, but as awful as that looks, it is a common thing to see in code that is generated/parsed, such as this or a query plan.) When the person who wrote the similar code saw this, they were concerned it was wrong, then the next person who did a code review was also concerned it was wrong, and when I saw it, I was sure it was right, but only because I had a bit more faith that if this were a problem with SQL Server’s toolset, I would probably have read about it!
Read on as Louis explains the logic. My preference would be to retain the parentheses to make it easier for humans to follow.
Names for constraints are optional meaning that if you don’t provide a name when it’s created or cannot afford one, one will be appointed to you by the system.
These system provided names are messy things and I don’t think I have to discourage you from using them. Kenneth Fisher has already done that in Constraint names, Say NO to the default.
But how do you know whether you have any?
Check out the informative comments as well.
The CREATE INDEX statement is used to do exactly what its name says, it creates an index. But when you say CREATE UNIQUE INDEX, you are doing more than that; you are enforcing a business rule that involves uniqueness.
I have a simple rule on this. Wherever possible business rules like uniqueness, check values, etc. should be part of the design of the table, and not enforced in an external object like an index.
So, rather than a unique index, I’d rather see a unique constraint on the underlying table.
But that’s where real life steps in. I see two scenarios that lead me to occasionally use CREATE UNIQUE INDEX.
Here’s a third: creating constraints can cause blocking issues. If you already have a large table and Enterprise Edition, creating a unique index can be an online operation (unless you have a clustered columnstore index on the table), but a unique constraint is always a blocking activity.
Then, let’s say the requirements are as follows:
1. No values that are either empty or only spaces
2. No leading spaces
3. No trailing spaces
4. Allow NULL if column allows NULL
Let’s look at how we could implement all of these independently, as there certainly are cases where you may wish to allow any or all of the situations in a column.
Click through for the scripts, as well as a time comparison to see how much overhead you’re adding.
As I started to work on this, my first thought was that it would be helpful to know how many tables had a composite primary key. This would give me an idea on how many tables I was dealing with. Thankfully, SQL Server has this information by using system DMVs (dynamic management views) along with the COL_NAME function.
Note: the COL_NAME function will only work with SQL Server 2008 and newer.
All of this time, I’d never known about COL_NAME.
SQL Server tracks untrusted Foreign keys in sys.Foreign keys with a column called is_not_trusted, there may be a number of reasons why a Foreign key may have become untrusted below are a couple of examples:
- Foreign key was disabled using the ‘NOCHECK’ option then re-enabled using ‘CHECK’ (not to be confused with ‘WITH CHECK’)
- Foreign key was disabled using the ‘NOCHECK’ option , Primary key data was Deleted and the Foreign key was Enabled only using ‘CHECK’ (Again not to be confused with ‘WITH CHECK’)
So what happens when you try and enable a Foreign key ‘WITH CHECK’ (Check existing data for referential integrity), if the data is consistent then this is going to succeed however if Rows have been deleted and the Primary key data no longer exists but the Foreign key data does for example then this is going to fail miserably.
What I like about this post is that he does more than just saying “hey, here’s how you get the key constraint to be trusted again;” he goes further and shows you how to figure out if it will work beforehand.