No matter how long you work with something, you can always find something that you never knew before. I found one about foreign keys this week.
I was reviewing SQL scripts for coworkers and I noticed that the foreign keys were written without referencing the parent table’s column. But the script didn’t fail and created the foreign keys correctly. So how did this work?
I don’t think I’ve ever seen this syntax either. I’m not a big fan of it for the same reason that Deborah isn’t a big fan of it: adding a couple more words does clarify your intent, and so add the words.
By large databases I’m roughly meaning databases with several hundred tables, and I usually see a lot of these tables with several hundred GB’s of data in them.
When I generally ask about the reason for no foreign key, I’m told
- they add overhead
- they give no benefit
- we can’t enter our data properly when we have them
The last one in the above list is generally down to poor modelling – an example being entering a later part of a financial transaction before the first part of the financial transaction has taken place. Once both parts of the financial transaction have taken place then the database is left in a consistent state – BUT, that generally being handled by the app NOT the database – OUCH!
There are times where key constraints are too much—often-updating fact tables might be one such scenario. And some of “too much” comes down to hardware quality. But for the most part, those key constraints are one of the clearest forms of database documentation available, not to mention their normal benefits.
One of the things I think is important in modeling your particular entity is including a primary key (PK). In my DevOps talk I stress this, as I’d rather most attendees come away thinking a PK is important as their first takeaway from the session. There are exceptions, but they are rare, and I would prefer that most tables just have some PK included from the beginning.
A PK ought to be stable as well, and there are plenty of written words about how to pick the PK for your particular problem domain. Often I have received the advice that natural keys are preferred over surrogate keys, and it is worth the effort to try and identify a suitable column (or set of columns) that will guarantee uniqueness. I think that’s good advice, and it’s also advice I tend to ignore.
Read on for Steve’s reasoning. I tend to use surrogate keys out of habit, though I do prefer to put unique key constraints on natural keys to help me reason through data models.
We want to spend our SQL Server licensing dollars wisely, so why ask it to do unnecessary work? More CPU time, more IO, and our client is waiting longer for the process to complete (I don’t like making anyone wait).
There’s a second “gotcha” with the
AFTER INSERTmethod that applies to only some use cases. Let’s say you’re loading some historical data into the table, and that data includes the
LastModifieddate. Or maybe your application has a very specific date that should be inserted into the table for this field.
Andy makes good points.
You should be able to create a #temp in every session. That’s the idea, right? It’s one of the things that differentiates a global temp table from a local temp table. But there can be some difficulties with that.
If you are working with reusable code that uses temp tables (a stored procedure for example), sometimes you need to create a constraint. The thing about constraints is that their names are just as unique as tables, stored procedures etc. i.e. the name of a constraint can only be used once. You can’t have two tables with the same constraint name. In fact, you can’t even have a constraint name that matches a table, stored procedure etc name.
There’s some solid advice in this post.
You will see the code is not the same as when I coded it:
([Value1] IS NOT NULL AND [Value2] IS NULL OR [Value1] IS NULL AND[Value2] IS NOT NULL OR NOT [Value1]=[Value2] AND [Value2]>(2)*[Value1])
Looking as this, one thing stands out to the eye: the parenthesis have been removed from the comparisons, but added to the literal value of 2. (Also, everything is surrounded by square brackets, but as awful as that looks, it is a common thing to see in code that is generated/parsed, such as this or a query plan.) When the person who wrote the similar code saw this, they were concerned it was wrong, then the next person who did a code review was also concerned it was wrong, and when I saw it, I was sure it was right, but only because I had a bit more faith that if this were a problem with SQL Server’s toolset, I would probably have read about it!
Read on as Louis explains the logic. My preference would be to retain the parentheses to make it easier for humans to follow.
Names for constraints are optional meaning that if you don’t provide a name when it’s created or cannot afford one, one will be appointed to you by the system.
These system provided names are messy things and I don’t think I have to discourage you from using them. Kenneth Fisher has already done that in Constraint names, Say NO to the default.
But how do you know whether you have any?
Check out the informative comments as well.
The CREATE INDEX statement is used to do exactly what its name says, it creates an index. But when you say CREATE UNIQUE INDEX, you are doing more than that; you are enforcing a business rule that involves uniqueness.
I have a simple rule on this. Wherever possible business rules like uniqueness, check values, etc. should be part of the design of the table, and not enforced in an external object like an index.
So, rather than a unique index, I’d rather see a unique constraint on the underlying table.
But that’s where real life steps in. I see two scenarios that lead me to occasionally use CREATE UNIQUE INDEX.
Here’s a third: creating constraints can cause blocking issues. If you already have a large table and Enterprise Edition, creating a unique index can be an online operation (unless you have a clustered columnstore index on the table), but a unique constraint is always a blocking activity.
Then, let’s say the requirements are as follows:
1. No values that are either empty or only spaces
2. No leading spaces
3. No trailing spaces
4. Allow NULL if column allows NULL
Let’s look at how we could implement all of these independently, as there certainly are cases where you may wish to allow any or all of the situations in a column.
Click through for the scripts, as well as a time comparison to see how much overhead you’re adding.
As I started to work on this, my first thought was that it would be helpful to know how many tables had a composite primary key. This would give me an idea on how many tables I was dealing with. Thankfully, SQL Server has this information by using system DMVs (dynamic management views) along with the COL_NAME function.
Note: the COL_NAME function will only work with SQL Server 2008 and newer.
All of this time, I’d never known about COL_NAME.