Press "Enter" to skip to content

Category: Data Modeling

An Overview of Fabric IQ

Brian Bonk talks ontologies:

If you followed along with the announcements from Microsoft Ignite, you might have stumbled upon the new Fabric IQ service.

For many people, this new service can seem a bit strange to see the point in, so in this blogpost I will try to help you understand the usage and business value of the new service.

Ontologies aren’t new—it’s mostly a metadata management exercise—but there are several companies (like Palantir) pushing this hard in their tools, and Microsoft is working that market segment. But instead of using all of this metadata management for data quality or master data management reasons, it’s for feeding into language models.

Leave a Comment

Thoughts on Data Modeling

Steve Jones has a two-fer. First up, he asks an opinion question about data modeling:

Recently, I had a few questions on database modeling. One was posted in the SQL Server Central forums, and a customer asked about ERD tooling on the same day. This came shortly after Redgate acquired Vertabelo (now Redgate Data Modeler). This stood out to me as very rarely in the last few years have I found people consulting and updating a diagram while performing database development.

Second, he takes a peek at a tool Redgate purchased:

Redgate acquired a data modeling tool from Vertabelo recently and I wanted to explore how it works. This is a short look at this tool and how it might be useful in working with databases.

My experience with data modeling has been that only the really large companies did a lot of work with upfront data modeling and keeping logical models up to date. It’s still quite useful for data warehouses, and that’s where the people I know who do a lot of data modeling make their living. But I find it’s too much of a hassle in fast-paced environments, especially when I can keep most or all of the data model in my head and I’m the person managing it all.

Essentially, data models are useful to the extent that they’re approximately true. But because they quickly get out of sync with reality, they quickly go from “quite useful” to “dirty lies.”

Leave a Comment

What-If Analysis in Power BI

Ben Richardson takes us through a what-if analysis:

What If Analysis is a modelling technique used to evaluate different outcomes by changing key input variables.

In Power BI, it uses What If parameters and dynamic DAX measures that recalculate outputs based on user input. Users can ask questions like:

  • “What if sales increase by 10%?”
  • “What if production costs drop by 5%?”

The parameters are created in the Modelling tab, where you define value ranges. Power BI automatically generates a slicer and a measure, which can then be used in DAX calculations to dynamically adjust metrics like revenue, cost, or profit.

Read on to see how it works, understanding that you have to provide the formulas for behavior. In other words, if your what-if parameter is around the unit price of some product, there is no built-in concept of price elasticity for the product. That’s something you’d have to implement yourself.

Comments closed

Thoughts on Data Integrity

Deborah Melkin shares some thoughts:

The first way to think of data integrity is a very small and literal interpretation. This is making sure that our data in the database is good. In many ways, these are easy to enforce – you add constraints. Primary Keys ensure that you know what makes each row unique. Unique constraints represent what would make each record unique if the primary key constraint, which is often a surrogate key these days, didn’t exist or offer different options. 

Read on for more about database design, default constraints, and a dive into data modeling.

Comments closed

Custom SCD2 with PySpark

Abhishek Trehan creates a type-2 slowly changing dimension:

A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.

SCD2 is a dimension that stores and manages current and historical data over time in a data warehouse. The purpose of an SCD2 is to preserve the history of changes. If a customer changes their address, for example, or any other attribute, an SCD2 allows analysts to link facts back to the customer and their attributes in the state they were at the time of the fact event.

Read on for an implementation in Python.

Comments closed

Implementing Role-Playing Dimensions in Power BI

Teo Lachev puts on a mask:

Role-playing dimensions are a popular business requirement but yet challenging to implement in Power BI (and Tabular) due to a long-standing limitation that two tables can’t be joined multiple times with active relationships. Declarative relationships are both a blessing and a curse and, in this case, we are confronted with their limitations. Had Power BI allowed multiple relationships, the user must be prompted which path to take. Interestingly, a long time ago Microsoft considered a user interface for the prompting but dropped the idea for unknown reasons.

Given the existing technology limitations, you have two implementation choices for implementing subsequent role-playing dimensions: duplicating the dimension table (either in DW or semantic model) or denormalizing the dimension fields into the fact table. The following table presents pros and cons of each option:

Click through for that table, as well as some thoughts on viable approaches, including an edge case.

Comments closed

Tips for Optimizing Power BI Semantic Models

Koen Verbeeck shares some tips:

Power BI is designed to be user-friendly. With just a few clicks, you can import data from various sources, combine them together in one data model and start analyzing it using powerful data visualizations. This sometimes leads to a scenario where people are just importing data into the tool without giving it too much thought. When you’re working on a solo project on a small dataset, there probably won’t be too many issues. But what if your report is successful and you want to share it with your colleagues and maybe other departments? Or more data is loaded into the model, but refreshes are taking more and more time? Even other data sources are added into your model, but writing DAX formulas has become hard, and reports are slowing down.

In this article, we’ll cover a couple of tricks that will help you make your Power BI models smaller, faster and easier to maintain. In the immortal words of Daft Punk: “Harder. Better. Faster. Stronger”.

Click through for those tricks and tips.

Comments closed

Microsoft Purview Classifications and Sensitivity Labels

James Serra labels the data:

I see a lot of confusion on how classifications and sensitivity labels work in Microsoft Purview. This blog will help to clear that up, but I first must address the confusion with Purview now that multiple products have been renamed to Microsoft Purview. I decided to use a question-and-answer format that will hopefully clear up the confusion (I was very confused too!):

Purview is a fantastic product. I just wish it cost about 10% as much as it does; then I could heartily recommend it to people.

Comments closed

Microsoft Fabric and Semantic Models

Kurt Buhler has a choose-your-own-adventure story:

Semantic models are integral to Microsoft Fabric. They use and are used by many of the different workloads. In Fabric, there’s more items that can connect to and consume your model—such as semantic link in notebooks. Because of these new options and tools, your model is exposed to additional types of users who will use it in different ways. As such, it’s important that you make good models that you manage well throughout their entire lifecycle.

Read on for more information and three separate scenarios

Comments closed

Using Schema Registry for Data Quality in Apache Kafka

Kai Waehner talks data quality:

Good data quality is one of the most critical requirements in decoupled architectures, like microservices or data mesh. Apache Kafka became the de facto standard for these architectures. But Kafka is a dumb broker that only stores byte arrays. The Schema Registry enforces message structures. This blog post looks at enhancements to leverage data contracts for policies and rules to enforce good data quality on field-level and advanced use cases like routing malicious messages to a dead letter queue.

Click through to learn more about the topic. This focuses a lot on the “why” and “what” but does have an example of “how” in there as well.

Comments closed