Press "Enter" to skip to content

Category: Data Modeling

Calculated Dimensions

Ginger Grant shows how calculated dimensions can solve the classic role-playing dimension problem in Analysis Services Tabular:

Working with role playing dimensions, which are found when you have say multiple dates in a table and you want to relate them back to a single date table, have always been problematic in SQL Server Analysis Services Tabular. Tabular models only allow one active relationship to a single column at a time. The picture on the left shows how tabular models represent a role playing dimension, and the model on the right is the recommended method for how to model the relationships in Analysis Services Tabular as then users can filter the data on a number of different date tables.

The big downside to this is one has to import the date table into the model multiple times, meaning the same data is imported again and again. At least that was the case until SQL Server 2016 was released. This weeks TSQL topic Fixing Old Problems with Shiny New Toys is really good reason to describe a better way of handling this problem.

Read on for how to implement calculated dimensions.

Comments closed

Parsing Text Fragments

Aaron Bertrand looks at a way of speeding up LIKE %Something% queries and builds a fragment table:

It’s clear that in this specific case – with an address column of nvarchar(60) and a max length of 26 characters – breaking up each address into fragments can bring some relief to otherwise expensive “leading wildcard” searches. The better payoff seems to happen when the search pattern is larger and, as a result, more unique. I’ve also demonstrated why EXISTS is better in scenarios where multiple matches are possible – with a JOIN, you will get redundant output unless you add some “greatest n per group” logic.

Read the whole thing.  If you’re interested in the concept, I recommend reading up on n-grams, like Alan Burstein’s series and this TechNet article on implementing N-Grams in SQL Server.

Comments closed

Testing Nested Sets

Nate Johnson continues his nested set series:

Now the complex.  To validate that our tree is properly structured, the following statements need to be true:

  1. Each node’s Right value is greater than its Left.

  2. More to the point, each node’s Right value is greater than all of its ancestors’ Left values.

  3. Similarly, each node’s Left value is less than all of its descendants’ Left values (and Right values, obviously!)

  4. Leaf nodes have no gaps between Left & Right: Right = Left + 1

  5. Depth is easy to verify because we already wrote the rCTE to calculate it!

  6. And of course, no orphans – all ParentIDs lead to an actual parent node, except of course if they’re NULL (root nodes).

Read on for further explanation of these points.

Comments closed

Extending Nested Sets

Nate Johnson extends the nested sets model to include a depth attribute:

Depth is pretty simple to add if you’ve already got a tree full of data.  We can use a recursive common table expression, or “rCTE“.  While normally these are frown-worthy (remember, recursion is not SQL’s strong suite), we’re only using it one time to populate an existing data-set, so we can keep on smiling.

Hierarchies in SQL are an important but not well understood topic.

1 Comment

Graphs In SQL

Joe Celko models graphs using ANSI SQL:

The initial proposed solutions to construct the subgraphs were essentially procedural traversal, dumping pairs of nodes into a temp table and incrementing a counter.

Let us try getting out of a procedural mindset and starting to think in sets instead. Let us give each subgraph a name, and a member node. Essentially, this directly models. The diagram I just gave you a paragraph or two ago. The next question is how do we get names for the subgraphs. I will propose the simple solution that each subgraph takes the name of the lowest element in it. This would give us a table that looks like this:

Read the whole thing.  Given the recent revival of graph databases, it’s important to take note that you can model the network-style problems using either graphs or relations; the trick is figuring out which is going to give you better long-run performance.

Comments closed

Multi-Database Schema Comparison

Kevin Hill shows how to perform schema comparison across multiple databases concurrently:

I recently had the need to compare a “Gold” or “Master” copy of our database to the 300 client copies of that database that exist in our Production environment.  I’m not alone…many fellow DBAs have had the same need.   Google searches for this confirm it.   This is for an upcoming upgrade to the application that will need post-upgrade comparison/verification.

There are 3rd party tools that do SQL Compares…my particular favorite is aptly named SQL Compare from Red Gate.  I’ve been using it off an on for 10 years.   I don’t know if it can be set up to hit more than one database at a time.  The other issue is that I don’t have a copy here.

Microsoft’s SQL Server Data Tools will also do this within Visual Studio.   Still one database at a time.  I forget where, but someone pointed me to the fact that SSDT uses SQLPackage.exe under the hood to do the work.  I figure if I can run it at a command line I can script out all of the databases.  I’m not much of a DOS scripting guy, so everything that follows is just my hack version…but it works, and not just on my machine!

This is very useful if you work in an environment with multiple copies of databases—or even if you have dev, test, and prod versions of the same database.

Comments closed

Codd’s Twelve Rules

E.F. Codd’s 12 rules (well, thirteen):

Twelve rules are cited below as part of a test to determine whether a product that is claimed to be fully relational is actually so. Use of the term “fully relational” in this report is slightly more stringent than in my Turing paper (written in 1981). This is partly because vendors in their ads and manuals have translated the term “minimally relational” to “fully relational” and partly because in this report, we are dealing with relational DBMS and not relational systems in general, which would include mere query-reporting systems.

However, the 12 rules tend to explain why full support of the relational model is in the users’ interest. No new requirements are added to the relational model. A grading scheme is later defined and used to measure the degree of fidelity to the relational model.

This particular article seems less important thirty years later, but it was vital in the early days of relational systems to understanding what, precisely, a relational database management system ought to do and—just as importantly—what it ought not do.  It wasn’t enough to slap SQL on top of a hierarchical database platform and call it relational.

Comments closed

Add And Remove Identity Columns

Dan Guzman shows how to use partition switching to add or remove identity columns from existing tables:

All tables are partitioned from a database storage engine perspective since SQL Server 2005, although multiple partitions require Enterprise Edition. The implication of this storage architecture is that one can use ALTER TABLE…SWITCH regardless of SQL Server edition to move the entire contents of one table to another without physically moving rows as long as the source and target tables have an identical schema and are physically aligned (indexes and tables on same filegroup). SWITCH performs fast storage meta-data changes so the operation typically takes less than a second regardless of table size. The target table must be empty before SWITCH. After SWITCH, the target table will contain the data and the source table will be empty (similar to TRUNCATE).

Since the identical schema restriction does not include the IDENTITY column property, SWITCH is a handy technique to add or remove IDENTITY from an existing column of a non-empty table. For an empty table, a simple drop and create is easier and more efficient.

This is one of the few really good uses of the SWITCH operator in standard edition, but if ever you do need it, you’ll be glad it’s there.

Comments closed

Trusting Foreign Keys

Jefferson Elias describes the concept of trusted foreign keys as well as their analog:

Specifying WITH CHECK in a statement tells to SQL Server the user wants it to validate the constraint against every single row in the table, then, if successful, enable it.

In contrast, specifying WITH NOCHECK, which is the default for an existing constraint, means that the constraint is enabled but no validation has been made on it. Even if this mode is faster to run, it can lead to severe side effects on performance: SQL Server doesn’t trust the constraint as it has not validated it. We refer to such a foreign key as an « untrusted foreign key ». As a consequence, the query optimizer won’t use the constraint to do his job…

There are benefits to having trusted foreign key constraints.  Check out the article for more details as well as how to fix this issue.

Comments closed

Joining On NULL

Erik Darling has opened a can of worms here:

WITH ALL THE TROUBLE NULLS CAUSE…

You’d think people would be more inclined to avoid them. Slap a NOT NULL constraint and a default value on your column and call it a day. I’m perfectly fine with bizarro world canary values. If it’s an integer column, some really high (low?) negative number. If it’s date-based, why not have it be the lowest value your choice accomodates?

Check out the comments, definitely.  I don’t think it’s as clear-cut as Erik argues; the idea of NULL has been and will remain controversial because it’s a useful concept but one which requires explicit consideration.

Comments closed