Press "Enter" to skip to content

Category: Data Modeling

Building a Report Model from Agile User Stories

Kelly Broekstra digs into a story:

Are you a model designer or BI developer tasked with building a data model and/or report from a series of user requirements or Agile User Stories? Do you know where to start, or what to try first? Here are some practical tips and techniques that I have used to design a great report model from Agile user stories.

Read on for questions to ask and how to translate that into a star schema model.

Comments closed

The Concept of Schema in Relational Databases

Adron Hall explains how different relational database management systems describe schemas:

From the viewpoint of someone familiar with the general idea of a schema, it can indeed seem unusual that databases like SQL Server, Oracle, MariaDB/MySQL, and PostgreSQL each interpret and implement schemas in slightly (or sometimes, vastly) different ways. While the core idea behind a schema as a structured container or namespace for database objects remains somewhat consistent, the exact nature, utility, and behavior of schemas vary across these systems.

Read on for an overview of these for four products, as well as what the ANSI standard indicates.

Comments closed

The Medallion Architecture in Data Modeling

Nikola Ilic gets the gold:

The most common pattern for modeling the data in the lakehouse is called a medallion. I love this name – it’s really easy to remember. But, why medallion? Tag along and you’ll soon find out why.

The same as for the lakehouse concept, credits for being pioneers in the medallion approach goes to Databricks.

What I’ve found interesting is the number of people who have taken to disliking the medallion architecture terms because Databricks pushed it so hard that their clients automatically assumed “medallion = using Databricks.”

Comments closed

A Story of Bad Data Modeling

Kendra Little unravels a puzzle:

I double-checked my queries. Had I goofed in my sql? Nope. Next, I looked into if some of the data was in an inconsistent state.

What I found was worse than what I’d imagined. As a data person, it made me feel sad and icky.

That’s because it’s usually not too hard to clean up bad data. It’s almost always much harder to fix a badly designed data model which is already established in production.

Read on for a tale as old as time: the clarion call of expediency now causing pain later.

Comments closed

An Overview of Data Modeling

Nikola Ilic provides an overview of data modeling:

In recent years, I’ve done dozens of training on various data platform topics, for all kinds of audiences. When teaching various data platform concepts and techniques, I find one of the concepts particularly intimidating for many business analysts, especially those who are just starting their journey. And, that is the concept of data modeling.

This is a good introduction and does a particularly good job of explaining why we have logical and physical data models. I have one medium-sized quibble with an otherwise-great article: 3rd Normal Form is nowhere near sufficient for a logical data model, and I’d make the strong case (in fact, I do make that case) that 5th Normal Form should be the standard and that 3NF is an anachronism which you should entirely replace with Boyce-Codd Normal Form.

Comments closed

Computed Columns in Snowflake

Kevin Wilkie does the math:

Sometimes to make our lives easier, we, as database engineers, can create a table that automatically tells us the answer as we need it – or at least how we tell it we want it. In SQL Server, we create what is called “Computed Columns.”

Read on to see how to create one of these in Snowflake.

Comments closed

DirectQuery Data Modeling

Jason Cockington share some advice:

From my experience, most people who have reports built on a DirectQuery connection into their data source did so because of a lack of understanding of what the DirectQuery connection was designed to achieve.  For the vast majority of reports, Import mode is the best solution for working with data in Power BI.  DirectQuery should really only ever be applied when you are trying to solve one of the following challenges.

  1. Real-time Data – you need to see the latest available data from the source
  2. Huge Datasets – you have many billions of rows of data (more than 10Gb) so you just can’t import it into Power BI
  3. Regulatory Compliance – the data must stay in the source for data security/privacy reasons

Click through for more information.

Comments closed

PayPal’s Data Contract Template Open Sourced

Jean-Georges Perrin makes an announcement:

A data contract is a binding agreement between the consumers and producers of data. You can see it as a data schema on steroids or data schema++. The goal of the contract is to set expectations between the parties. It can be built as fit-for-purpose where the consumers and producer agree on what it should contain or can serve as a brochure for any consumer willing to access the data offered by this (data) product.

Click through to learn more about data contracts and then check out the contract template itself on PayPal’s GitHub repo.

Comments closed

Route Planning in Postgres

Mark Litwintschik plans a journey:

I recently came across a transit route feed aggregator called Transitland. They list feeds from 2,500 operators in 55+ countries around the world. Among these feeds is one for FlixBus, a 12-year-old coach service provider. Below is a route map of their European destinations.

In this post, I’ll import their feed into PostgreSQL, build visualisations of their routes and plan a bus trip from Vienna to Oslo.

Read on for the process.

Comments closed

Recommendations for Dedicated SQL Pool Data Modeling

Bhaskar Sharma has some advice:

In this article, I will discuss how to physically model an Azure Synapse Analytics data warehouse while migrating from an existing on-premises MPP (Massive Parallel Processing) data warehouse solution like Teradata and Netezza. The approach and methodologies discussed in this article are purely based on the knowledge and insight I have gained while migrating these data warehouses to Azure Synapse dedicated SQL pool. 

Dedicated SQL pools are close enough to regular SQL Server that we make a lot of assumptions about it, some of which may be wrong.

Comments closed