Category: Data Modeling

I recently had the need to compare a “Gold” or “Master” copy of our database to the 300 client copies of that database that exist in our Production environment. I’m not alone…many fellow DBAs have had the same need. Google searches for this confirm it. This is for an upcoming upgrade to the application that will need post-upgrade comparison/verification.

There are 3rd party tools that do SQL Compares…my particular favorite is aptly named SQL Compare from Red Gate. I’ve been using it off an on for 10 years. I don’t know if it can be set up to hit more than one database at a time. The other issue is that I don’t have a copy here.

Microsoft’s SQL Server Data Tools will also do this within Visual Studio. Still one database at a time. I forget where, but someone pointed me to the fact that SSDT uses SQLPackage.exe under the hood to do the work. I figure if I can run it at a command line I can script out all of the databases. I’m not much of a DOS scripting guy, so everything that follows is just my hack version…but it works, and not just on my machine!

This is very useful if you work in an environment with multiple copies of databases—or even if you have dev, test, and prod versions of the same database.

Comments closed

Codd’s Twelve Rules

Published 2016-05-30 by Kevin Feasel

E.F. Codd’s 12 rules (well, thirteen):

Twelve rules are cited below as part of a test to determine whether a product that is claimed to be fully relational is actually so. Use of the term “fully relational” in this report is slightly more stringent than in my Turing paper (written in 1981). This is partly because vendors in their ads and manuals have translated the term “minimally relational” to “fully relational” and partly because in this report, we are dealing with relational DBMS and not relational systems in general, which would include mere query-reporting systems.

However, the 12 rules tend to explain why full support of the relational model is in the users’ interest. No new requirements are added to the relational model. A grading scheme is later defined and used to measure the degree of fidelity to the relational model.

This particular article seems less important thirty years later, but it was vital in the early days of relational systems to understanding what, precisely, a relational database management system ought to do and—just as importantly—what it ought not do. It wasn’t enough to slap SQL on top of a hierarchical database platform and call it relational.

Comments closed

Add And Remove Identity Columns

Published 2016-04-18 by Kevin Feasel

Dan Guzman shows how to use partition switching to add or remove identity columns from existing tables:

All tables are partitioned from a database storage engine perspective since SQL Server 2005, although multiple partitions require Enterprise Edition. The implication of this storage architecture is that one can use ALTER TABLE…SWITCH regardless of SQL Server edition to move the entire contents of one table to another without physically moving rows as long as the source and target tables have an identical schema and are physically aligned (indexes and tables on same filegroup). SWITCH performs fast storage meta-data changes so the operation typically takes less than a second regardless of table size. The target table must be empty before SWITCH. After SWITCH, the target table will contain the data and the source table will be empty (similar to TRUNCATE).

Since the identical schema restriction does not include the IDENTITY column property, SWITCH is a handy technique to add or remove IDENTITY from an existing column of a non-empty table. For an empty table, a simple drop and create is easier and more efficient.

This is one of the few really good uses of the SWITCH operator in standard edition, but if ever you do need it, you’ll be glad it’s there.

Comments closed

Trusting Foreign Keys

Published 2016-03-22 by Kevin Feasel

Jefferson Elias describes the concept of trusted foreign keys as well as their analog:

Specifying WITH CHECK in a statement tells to SQL Server the user wants it to validate the constraint against every single row in the table, then, if successful, enable it.

In contrast, specifying WITH NOCHECK, which is the default for an existing constraint, means that the constraint is enabled but no validation has been made on it. Even if this mode is faster to run, it can lead to severe side effects on performance: SQL Server doesn’t trust the constraint as it has not validated it. We refer to such a foreign key as an « untrusted foreign key ». As a consequence, the query optimizer won’t use the constraint to do his job…

There are benefits to having trusted foreign key constraints. Check out the article for more details as well as how to fix this issue.

Comments closed

Joining On NULL

Published 2016-03-18 by Kevin Feasel

Erik Darling has opened a can of worms here:

WITH ALL THE TROUBLE NULLS CAUSE…

You’d think people would be more inclined to avoid them. Slap a NOT NULL constraint and a default value on your column and call it a day. I’m perfectly fine with bizarro world canary values. If it’s an integer column, some really high (low?) negative number. If it’s date-based, why not have it be the lowest value your choice accomodates?

Check out the comments, definitely. I don’t think it’s as clear-cut as Erik argues; the idea of NULL has been and will remain controversial because it’s a useful concept but one which requires explicit consideration.

Comments closed

Data Modeling In Power BI

Published 2016-03-14 by Kevin Feasel

Ginger Grant discusses creating models in Power BI:

When comparing the features of the Online Client with the Desktop version of Power BI, there is one very obvious difference, there is no way to create a data model in Power BI online. It is not possible to create a data model using the online client. The online client is designed to connect to an existing online source such as Sales Force or Azure DB. If you are using an existing model, there is no need to create one. When using the enterprise gateway, which uses an on-premises database such as a SQL Server, SSAS or Hana, the data model is contained within the database exposed via the enterprise gateway, so again no reason exists to create a data model. Report creation can occur either using the online client or desktop as there is compelling technical reason that I am aware of which would determine where the report is created.

Cf. an earlier post on Power Pivot and Power BI modeling.

Comments closed

Table Smells

Published 2016-03-14 by Kevin Feasel

Phil Factor has a query he shares to discern table smells in SQL Server:

Table smells in SQL Server are just like code smells. They are just an indication that something may need to be checked or investigated. They are easy to check as well, because you can generally ferret through the object catalog metadata views to flush out the aspects that just don’t smell right. In this blog, I show the sort of query I’ll use. Actually, I generally use rather more strict criteria because I’d be concerned about tables that don’t seem to be making full use of constraints, and tables that don’t have ‘soft’ dependencies (aren’t referenced in code within the metadata, from views, procedures or functions). I like to have a result that displays the names of the tables that look suspect, along with the list of the ‘smells’.

This is a great start. As Phil notes in the comments, it’s not necessarily that these are wrong so much as that if you see them, there ought to be a specific reason for it to be this way.

Comments closed

Power Pivot And Power BI Data Modeling

Published 2016-02-22 by Kevin Feasel

Avi Singh has a post on data modeling in Power Pivot and Power BI:

It was Greg, who suggested that we form a book reading club. Our first book was one I had heard about, but never read – The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling By Ralph Kimball. As a business analyst, I had leaned heavily on Excel, along with a mishmash of other technologies. Data warehouse and data modeling didn’t seem like topics that would be relevant to me; more for an IT/BI team perhaps. But I figured, it couldn’t hurt to learn something new.

Our book club meetings looked more as if, class was in session. We brought in our questions, and Greg patiently answered them, helping us realize the importance of the topics, and trade-offs involved in various choices. As things go, our reading club was disbanded before we were even halfway through the book. But the knowledge that I had gained, helped me grow by leaps and bounds in my Power Pivot and Power BI journey.

Kimball-style fact-dimensional modeling remains a brilliant solution.

Comments closed

Logical Data Models

Published 2016-02-22 by Kevin Feasel

Kevin Kline discusses logical data modeling:

In a recent blog post entitled Is Logical Data Modeling Dead?, Karen Lopez (b | t) comments on the trends in the data modeling discipline and shares her own processes and preferences for logical data modeling (LDM). Her key point is that LDMs are on the decline primarily because they (and their creators) have failed to adapt to changing development processes and trends.

I love all things data modeling. I found data models to be a soothing and reassuring roadmap that underpinned the requirements analysis and spec writing of the Dev team, as well as a supremely informative artifact of the Dev process which I would constantly refer to when writing new T-SQL code and performing maintenance. However, as time has passed, I have been surprised by how far it has fallen out of favor.

This is an interesting discussion. I’m not sure I’ve ever created a true logical data model. I’ve worked with systems which could potentially take advantage of them, but they never hit the top of the priority list.

Comments closed

Anchor Modeling

Published 2016-01-05 by Kevin Feasel

Steph Locke has a presentation on Anchor Modeling as 6th Normal Form:

Anchor Modelling moves you beyond third normal form and into sixth normal form. What does this mean? Essentially it means that an attribute is stored independently against the key, not in a big table with other attributes. This means you can easily store metadata about that attribute and do full change tracking with ease. The historical problem with this methodology is that it makes writing queries a real pain. Anchor Modelling overcomes this by providing views that combine all the attribute data together.

Anchor Modeling is a rather different approach, so if it sounds interesting, check out the tutorial.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31