Press "Enter" to skip to content

Category: Data Modeling

Comparing Views to Derived Tables and CTEs

Itzik Ben-Gan takes us through views:

As usual when discussing relational theory, we SQL practitioners are often told that the terminology we’re using is wrong. So, in this spirit, right off the bat, I’ll start by saying that when you use the term tables and views, it’s wrong. I’ve learned this from Chris Date.

Recall that a table is SQL’s counterpart to a relation (oversimplifying the discussion around values and variables a bit). A table could be a base table defined as an object in the database, or it could be a table returned by an expression—more specifically, a table expression. That’s similar to the fact that a relation could be one that is returned from a relational expression. A table expression could be a query.

Now, what is a view? It’s a named table expression, much like a CTE is a named table expression. It’s just that like I said, a view is a reusable named table expression that is created as an object in the database, and is accessible to those who have the right permissions. This is all to say, a view is a table. It’s not a base table, but a table nonetheless. So just like saying “a rectangle and a square” or “a whisky and a Lagavulin” would seem strange (unless you had too much Lagavulin!), using “tables and views” is as improper.

Yeah, if we’re going to push our glasses up the bridges of our noses and get all relational here, we’d have the relvar (which is a name and collection of attributes + data types) and the relation (which you can think of as a relvar at a particular point in time—that’s where we get tuples of data). And as Itzik points out, what the RDBMS world calls a view quite neatly fits the definition of a relvar, as we define a name and collection of attributes + data types.

If all of this could not mean less to you, still read the article for Itzik’s view on views.

Comments closed

Empty Strings vs NULL

Erik Darling lays out a challenge:

Empty Strings Aren’t Better Than NULLs

Prove me wrong.

Click through for commenters’ perspectives.

I’m not at the C.J. Date level of “NULLs are a mockery of the entire relational system and should be burned to the ground and that ground be salted and that salt be burned, just in case” but I do have my sympathies there.

An empty string has meaning: the value of a given attribute in this tuple is an empty string. NULL has no value and therefore violates first normal form (which, unlike my joking histrionics above, is an argument Date lays out in far too much detail for me to include here).

So what if you don’t know the value of a thing? Then the answer is, don’t store it! If there are certain attributes which may be missing at insertion time and are nonetheless relevant to maintain, use 6th Normal Form for each of those attributes. Then, the existence of a record indicates that it has a value and the lack of existence indicates that there is no value as of this time. This is quite different from NULL, about which we can only say “I don’t know if there is a value at this time, but if you’d like, I can do weird things with some of your queries if you happen to forget about this non-information.”

3 Comments

Designing and Managing Large Datasets in Power BI

Paul Turley continues a series on doing Power BI the right way:

I was just talking to a consulting client about the best approach to build a data model and he told me something very interesting about the way they were loading data into Power BI. He said “We don’t use facts and dimensions, we load all of our data into one huge table.” He said that their data model performs well and that it meets their reporting needs. It is a difficult point to argue, when something is working at the time although the design might not follow the accepted rules. Life is like that and there are plenty of analogies to make the point that a practice, even a real bad practice, might solve a problem for a period of time and under certain conditions. <analogy>You can drive a car at excessive speed to get to your destination faster. You might not get caught by the police on that day and you might not crash but eventually, if you make it a habit, this practice will catch up to you.</analogy> Data is like that. If you don’t play by the rules, you limit your options. Bending the rules lets you move faster and sometimes with less hassle. But, as the project scope expands – and after adding enough data or other complexities to the solution, it will not endure. The data model won’t perform well, won’t load the correct data or it just won’t be reliable.

This post will explore the realities of best practice design for large data models; some important considerations and trade-off decisions when working with both “big data” and “large data”.

Read on for Paul’s tips.

Comments closed

Data Modeling Essentials in Power BI

Paul Turley continues a series on doing Power BI the right way:

One of the most important lessons I have learned about data modeling over the past 20+ years is that there isn’t one model that fits all business needs. However, there are simple patterns we can follow to model data according to different business needs. Each pattern or schema has a different set of rules. At a very high level, we can think of these schemas in three different categories.

This is the 101 level course, but it’s good to get a refresher on the fundamentals before jumping into the complicated part.

Comments closed

Azure Purview: External Connections and the Starter Kit

Wolfgang Strasser continues a series on Azure Purview. First up is a look at external connections:

During my tests with Azure Purview I found out, that in my demo accounts (I created multiple to test it) the Management menu was missing some items.

Read on to learn why and how you can rectify this. Then, check out Wolfgang’s take on the Starter Kit:

Very often, when I want to test some new services I miss some infrastructure and environments I can start and play with. I am not talking about creating a new Azure Purview account (see my previous blog post – Creating an Azure Purview account) – I am talking about the data infrastructure to analyze, catalog and gain knowledge out of it.

We could start to create such an infrastructure, BUT: the Purview team create a Starter Kit to quickly create a data estate and configure everything that you can start with Purview within a view minutes.

Read on for one issue (of the self-inflicted variety) Wolfgang ran into during deployment. But it does look like a great way to get started with Purview and build up a relevant demo environment.

Comments closed

Power BI the Right Way: Separating Data Models and Reports

Paul Turley continues a series on doing Power BI the right way:

Back in the day, when we created BI solutions, reports and data models were separate. If you created a cube or Tabular model with Analysis Services, it was developed with Visual Studio and deployed to a server. Reports could be authored and deployed to the report server, separately. Now, with Power BI Desktop, you author your data model and report in the same development space and then deploy the whole kit and kaboodle to the service at once.

Secretly, Power BI actually separates the report from the data model (called a dataset in the service) and gives them both the same name. This is very convenient for self-service projects because it is quick and easy to make changes to the data model, queries and measures if you need to make a report enhancement. This is all well and good for small, one developer projects but what about those larger scale solutions where the data model and reports are developed by different folks, or when multiple reports are connected to the same data model?

At what point does it make sense to separate the data model and reports into separate files?

Read on and let Paul illuminate.

Comments closed

The Power BI Field Finder

Stephanie Bruno has updated a useful tool:

If you’re like me, building a data model in Power BI is an iterative process.  Sometimes you have to try out different ways of writing measures before you hit on the one that’s right.  You end up with temporary measures that don’t actually end up being used in visuals.  You may also pull in more columns than you might end up needing, just in case.  When you’ve finally finished your masterpiece with measures and visuals, there are probably quite a few that you don’t need.  Two problems with this are that having extraneous columns and measures (1) can slow down your model and (2) can make it more difficult to maintain.  You may also want to know where on your report a change to a measure will have an impact.

Click through for a demonstration of the solution.

Comments closed

Finding Unused Columns in Power BI Data Models

Matt Allington wants to trim the fat:

I have a saying in Power BI. Load every column you need, and nothing that you don’t need. The reason for this advice is that columns can make your data model bigger and less performant. You will of course need some columns in your data model for different purposes. Some are used for defining measures and some are used for slicing, dicing and summarising your data in the various visuals. But it is very common for people to load everything from the source, meaning that some of the columns are likely to be loaded but not used. Once the data model is ready and the reporting is done, it can be beneficial to remove the columns that are not being used and are not likely to be used for ad hoc reporting in the near future. The question is – how do you find the columns not being used? This is where Imke’s Power BI Cleaner tool comes in; I will show you how to use it below.

Read on for Seven Minute Abs for your Power BI data model.

Comments closed

Validating Data Model Results

Paul Turley continues a discussion on Power BI data model validation:

We often have users of a business intelligence solution tell us that they have found a discrepancy between the numbers in a Power BI report and a report produced by their line-of-business (LOB) system, which they believe to be the correct information.

Using the LOB reports as a data source for Power BI is usually not ideal because at best, we would only reproduce the same results in a different report. We typically connect to raw data sources and transform that detail data, along with other data sources with historical information to analyze trends, comparisons and ratios to produce more insightful reports.

However, if the LOB reports are really the north star for data validation, these can provide an effective means to certify that a BI semantic model and analytic reports are correct and reliable.

Click through for more details.

Comments closed

An Introduction to Data Vault

Tino Zishiri walks us through the basics of the Data Vault modeling technique:

The Data Vault methodology also addresses a common limitation that relates to the dimensional model approach. There are many good things to say about dimensional modelling, it’s a perfect fit for doing analytics, it’s easy for business analysts to understand, it’s performant over large sets of data, the list goes on.

That said, the data vault methodology addresses the limitations of having a “fixed” model. Dimensional modelling’s resilience to change or “graceful extensibility”, as some would say, is well documented. It’s capable of handling changing data relationships which can be implemented without affecting existing BI apps or query results. For example, facts consistent with the grain of an existing fact table can be added by creating new columns. Moreover, dimensions can be added to an existing fact table by creating new foreign key columns, presuming they don’t alter the fact table’s grain.

The most interesting thing to me about Data Vault is that it’s very popular in Europe and almost unheard-of in North America. That’s the impression I get, at least.

Comments closed