Warehousing – Page 14

Data Architecture Questions to Ask

Published 2022-07-27 by Kevin Feasel

As an example, if I were asked what product to use to store data in the Azure cloud, I could come up with at least a dozen options, so I need to ask questions to reduce the choices to the best use case for the customers situation. This will avoid what I have seen many times – a company chooses a particular product and after their solution is built, they say the product is “terrible”, but they were using it for a use case that it was not designed for. But the customer was not aware of a better product for their use case because “they don’t know what they don’t know”. That is why you should work with an architect expert as one of your first order of business: the technology decisions at this early part of building a solution are vital to get correct, as finding out 6-months or one year later that you made the wrong choice and have to start over can lead to so much wasted time and money (and I have seen some shocking waste).

Read on for a slew of questions.

Comments closed

Creating a Snowflake Instance

Published 2022-07-27 by Kevin Feasel

Arun Sirpal sets up Snowflake:

Now let’s start the process of creating a snowflake account in the Azure Cloud. You can sign up for a free trial from here – https://signup.snowflake.com/ I am going to bypass this and go straight to the setup screens. (This is slightly different because as an org-admin I have the power to create accounts)
Select the cloud provider and edition you require; we have already discussed these options before. You know me, its going to be Azure but feel free to dive into AWS or GCP.

Read on for some step-by-step installation instructions.

Comments closed

The Basics of Slowly Changing Dimensions

Published 2022-07-21 by Kevin Feasel

Soheil Bakhshi explains what slowly changing dimensions are:

Another example is when a customer’s address changes in a sales system. Again, the customer is the same, but their address is now different. From a data warehousing standpoint, we have different options to deal with the data depending on the business requirements, leading us to different types of SDCs. It is crucial to note that the data changes in the transactional source systems (in our examples, the HR system or a sales system). We move and transform the data from the transactional systems via extract, transform, and load (ETL) processes and land it in a data warehouse, where the SCD concept kicks in. SCD is about how changes in the source systems reflect the data in the data warehouse. These kinds of changes in the source system do not happen very often hence the term slowly changing. Many SCD types have been developed over the years, which is out of the scope of this post, but for your reference, we cover the first three types as follows.

Click through for depictions of the first three types as well as implementation details and pains.

Comments closed

The Basics of Snowflake Architecture

Published 2022-07-20 by Kevin Feasel

Arun Sirpal lays out the foundation of Snowflake DB’s architecture:

At the most basic level, Snowflake has 3 important components. The Cloud services layer, centralised storage layer and the compute layer.
Cloud services – they call this the “brains” of snowflake. This is where infrastructure management takes place, the optimiser is based (cost-based), metadata management and security (authentication and access control) are handled.

Read on to learn about the other two layers and how they meet.

Comments closed

The Importance of a Proper Datamart / Data Warehouse

Published 2022-07-19 by Kevin Feasel

Teo Lachev explains why you want a datamart (or a data warehouse) for BI solutions:

I sent a proposal for implementing a classic BI solution: Azure SQL-based datamart (not Power BI datamart please), ETL, semantic model, and reports. The client had a sticker shock. Return to sender … as other BI companies that quoted can do it for half! Upon digging, it turned out the other companies would build the semantic model (aka Power BI dataset) directly on top of the data source. On a T&M basis, of course, what else? By contrast, I give fixed-price milestone-driven proposals and I don’t get paid unless I deliver and meet written and agreed upon success criteria, but that’s a different story.
So, let me count the ways as the poet would say. It’s certainly technically possible to slap a dataset on top of the data source(s). That’s what self-service BI is all about right … until it doesn’t serve anymore

Read on for more detail.

Comments closed

Have One Data Model per Business Area

Published 2022-07-15 by Kevin Feasel

James McGillivray offers us an important piece of advice:

I cannot stress this enough. If people are consuming your data in multiple places, the data needs to come from the same data model. That can be an Enterprise Data Warehouse, a Data Mart, a Power BI Model, or any other data source, but at some point you need to be able to track the data back to a single place. If you don’t do this, you will spend THE REST OF YOUR DAYS explaining the differences between the data models to business and customers, and reconciling the differences over and over again.

Read on to learn why this is so important.

Comments closed

Choosing among Snowflake Editions

Published 2022-07-13 by Kevin Feasel

Arun Sirpal helps us make a choice:

You have decided that snowflake is the technology you want to use to build your next gen data platform, you have decided your cloud provider (Azure, AWS, GCP) then next you need to think about what edition of snowflake suits your business needs?

Read on to see what the major differences are between editions.

Comments closed

Power BI as an Enterprise Data Warehouse

Published 2022-07-01 by Kevin Feasel

James Serra follows Betteridge’s Law of Headlines:

With Power BI continuing to get many great new features, including the latest in Datamarts (see my blog Power BI Datamarts), I’m starting to hear customers ask “Can I just build my entire enterprise data warehouse solution in Power BI”? In other words, can I just use Power BI and all its built-in features instead of using Azure Data Lake Gen2, Azure Data Factory (ADF), Azure Synapse, Databricks, etc? The short answer is “No”.

Read on to understand why Power BI shouldn’t be your data warehouse.

Comments closed

Semantics Layers for Data Lakehouses

Published 2022-06-27 by Kevin Feasel

Jans Aasman explains why semantic modeling is so important for a data lakehouse:

Data lakehouses would not exist — especially not at enterprise scale — without semantic consistency. The provisioning of a universal semantic layer is not only one of the key attributes of this emergent data architecture, but also one of its cardinal enablers.
In fact, the critical distinction between a data lake and a data lakehouse is that the latter supplies a vital semantic understanding of data so users can view and comprehend these enterprise assets. It paves the way for data governance, metadata management, role-based access, and data quality.

For a deeper dive into the topic, Kyle Hale has a post covering this with Databricks and Power BI as examples.

Comments closed

Power BI Field Parameters and Type 2 SCDs with Bonus Fields

Published 2022-06-20 by Kevin Feasel

Koen Verbeeck extends the type 2 slowly changing dimension:

Power BI field parameters are a new feature in Power BI Desktop, and it’s one of the best of the past months. In short, Power BI field parameters allow you to easily switch between dimensions attributes or measures in a filter. Previously, you had to do all sorts of DAX wizardry to make this happen, but now it’s just a couple of clicks.
The goal of this blog post is not to tell you exactly how they work, but rather showcase an interesting use case. You can find more info about Power BI field parameters in the official blog post, but also here, here and here. The use case I’m talking about is slowly changing dimensions of Type 2, you know, the one where we insert a record for every change. Often, I also include an extra column for each column of which we’re tracking history: the “current value column”. For example, if we keep history of the department for an employee, I have a column “CurrentDepartment”. If a type 2 change occurs, the values of this columns are updated to the last known value for this dimension member. This allows to answer different types of questions, because sometimes users are interested in the historical values, but sometimes they just want to know the current value.

Read on for the use case as well as how you might combine field parameters with the idea of current values on type-2 slowly changing dimensions.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Warehousing