Guessing At Foreign Key Relationships

Daniel Hutmacher has put together a script to try to find hidden foreign key relationships in a database:

Now, before you go crazy with this stuff, remember, it’s not a magic bullet, but rather some automation help to save you some coding and to help you review your data model. The script doesn’t change the database, it only prints out its suggestions, and this is totally by design.

  • For this to work, you’ll obviously need proper primary keys or unique indexes on your referenced tables.

  • We’re working on the assumption that the referencing and referenced column names are the same. Go ahead and change the script to suit your naming standards (look for the comment in the CTE)

  • The script has no domain knowledge of your database, some of the suggestions are probably going to be downright silly.

This is a good first pass approach, especially if you have a larger database completely lacking in relational integrity.

Indexing Foreign Keys

Kim Tripp looks at the practice of adding non-clustered indexes on columns which make up foreign key constraints:

I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.

This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.

Read on for Kim’s thoughts on the topic and some good practices.

Non-Trusted Foreign Keys

Daniel Janik explains what happens when you don’t have trusted foreign key constraints:

Why is it untrusted? Perhaps we disabled the check to load data and neglected to re-enable it?

No matter what the reason is the next part is not as simple. This is for two reasons.

  1. The data in the child table may not be valid. Since the key was not being checked I may have data in my table that isn’t represented in the parent.

  2. The syntax is a bit silly. As Mike Byrd in Austin, TX says, Microsoft studders. The syntax to reenable is “CHECK CHECK”. Let’s look at how we reenable the Address key check.

Read on for pros and cons of disabling (or not trusting) foreign key constraints.

Uniqueness And Multiple NULL Values

Dennes Torres shows how to allow an indefinite number of NULL values  while guaranteeing non-NULL values are unique:

Regardless of using unique constraint or unique index, the field can accept null values, however the uniqueness will result in only accepting a single row with null value.

The solution to allow nulls in unique fields is create a unique filtered index excluding the nulls of the index, due to that the uniqueness of the nulls will not be validated and multiple rows with nulls will be accepted.

Click through for the code.  I enjoy asking this as an interview question.  It’s a non-trivial problem with a non-trivial solution and isn’t a trick question.

Check Constraints On Date Ranges

Michael Swart has a script to check start versus end dates to ensure there is a valid check constraint for date ranges:

Don’t blindly run scripts that you got from some random guy’s blog. Even if that someone is me. That’s terribly irresponsible.

But this query may be useful if you do want to look for a very specific, simple kind of constraint that may match your business specs. These constraints are just suggestions and may not match your business rules. For example, when I run this query on Adventureworks, I get one “missing” check constraint for HumanResources.Shift(StartTime, EndTime) and when I look at the contents of the Shift table, I get this data:

If you wish non-blindly to run a script you got from that guy’s blog, click through for the script and more details, including a discussion of false positives.

Indexing Foreign Keys

Kendra Little answers a question about indexing foreign key constraints:

We recently had a SQL Server performance assessment. It remarked on two things that made me think:

1 ) tables with foreign keys but missing supporting index
2 ) tables with indexes that are not used

But in our case the remark in Case 2 was on a index that supports a foreign key!

Curtain down!

Now to my question, in which cases should you as a rule create indexes to support a foreign key?

I think Kendra elaborates the pros and cons well.  I’d lean against automatically creating indexes because I’ve worked with scenarios in which you don’t drive from the Parent to the Child; instead, you drive from Child to Parent or Something to Child to Parent and those indexes go unused.

Constraints Or Unique Indexes

Kendra Little discusses unique constraints versus unique indexes:

Unique constraints and unique nonclustered indexes have a lot in common: unique constraints are implemented with a unique index behind the scenes.

While unique nonclustered indexes have a couple of additional features, I can still think of a scenario where it might be best to use unique constraints.

The value I see in unique constraints over unique indexes is semantic:  it says to everyone, “I’m supposed to be unique by virtue of my existence.”  They’re part of the data model, whereas indexes are performance optimizations to handle common data paths.

Surrogate Versus Natural Keys

Kenneth Fisher digs into the debate on surrogate keys versus natural keys:

A natural key is one constructed of data that already exists in the table. For example using latitude and longitude in a table of addresses. Or the social security number in a table of employees. (Before you say anything, yes, the social security number is a horrible primary key. Be patient.)

My personal preference is to use surrogate keys most of the time and put unique constraints (or unique indexes) on the natural key.  There are some occasions in which I’d deviate, but ceteris paribus I’d pick this strategy..

Rename A Primary Key Constraint

Steve Jones shows how to rename a primary key constraint:

When you compare that with the same table in another database, what’s the likelihood that you’ll have the PK named PK__OrderDet__D3B9D30C7D677BB4? Probably pretty low.

This means that if you are looking to deploy changes, and perhaps compare the deployment from one database to the next, you’ll think you have different indexes. Most comparison tools will then want to change the index on your target server, which might be using this technique. Or the choice might be something that performs much worse.

What we want to do is get this named the same on all databases. In this case, the easiest thing to do with rename the constraint on all systems. This is easy to do with sp_rename, which is better than dropping and rebuilding the index.

Do read this and avoid renaming a constraint the bad way.

Dealing With Foreign Keys

Kenneth Fisher discusses foreign key constraints:

I have to be certain to delete from OrderDetail first, then Order (to maintain the RI) and then load Order first then OrderDetail. No big deal in this simple example, but what if I’m dealing with a dozen tables? Or I’m only re-loading the parent (Order)?

The easiest thing to do is to disable the foreign key, load your data, and then re-enable the foreign key. You might be tempted to skip that last step but don’t. RI is very important and in fact a trusted foreign key can be used by the optimizer to improve your query plan. It’s easy to say that the application doesn’t have bugs that cause problems with referential integrity. It’s not like you’re ever going insert an OrderDetail without an Order right? Unfortunately it’s far to easy for mistakes to happen. Maybe not in the application, it might be a mistake in an update meant to fix something else. Our job is to protect the data, and RI is an important part of that. So by all means disable a foreign key to help with a load but make sure you turn it back on when you are done.

If a foreign key constraint isn’t trusted, the optimizer won’t be able to assume relational integrity, and so it’s possible that the optimizer could make sub-par choices when joining tables with a foreign key constraint.


April 2017
« Mar