Press "Enter" to skip to content

Category: Warehousing

Managed Self-Service BI in Power BI

Gogula Aryalingam has started a series on managed self-service BI. Part 1 provides an overview of the topic:

When putting together a business intelligence strategy using Power BI, Microsoft recommends three primary strategies that an organization can adopt. Out of these, the one that I tend to go with is managed self-service BI, which brings forth the concept of discipline at the core, flexibility at the edge. This concept is the dominant strategy used for BI at Microsoft itself; explained very nicely in this article. It’s my personal favorite, because I find it an effective means of onboarding customers once the core platform is built with the required standards (discipline), and then help them adopt the solution from the edge, thus providing them with the best of both worlds.

Part 2 takes us to the edge:

Now, what happens when an analyst, for instance, has a set of sales target spreadsheets and wants to compare the figures with sales metrics so that salespeople’s performances can be measured? It certainly needs a new dataset. However, flexibility at the edge has to prevail in the right way. This post will look at how we can go about this keeping to discipline at the core, flexibility at the edge.

Note: The analyst’s requirement is at current local to their group or department. It has not yet been made an organizational requirement. That’s how most requirements start out: A requirement at the departmental level, and then when enough people start reaping the benefits within and outside of the department, it can get absorbed into the core.

Part 3 returns to the core:

One problem that we may have overlooked when building a bunch of core datasets in that post, is that certain dimensions tend to duplicate across the datasets. Imagine a scenario where the single master data source of a managed self-service setup is a data warehouse, which sources all the required dimensions. When you have, for example, core reseller sales, internet sales, and finance datasets, each one will have a calendar dimension and a few others created in each of these datasets. This is not ideal if you think about the extent of the duplication and effort that is required.

This is where, once again, using DQ for PBI datasets and AS comes into play, where you could draw up a layered core dataset architecture. If we take the example of AdventureWorks’ fact tables in the data warehouse (single master data source) you can figure out what the business processes are. 

Read on for Gogula’s thoughts. I think there’s a lot going for this particular strategy, especially in a large organization with hundreds (or thousands) of people actively using Power BI. At that point, doing everything through a central IT organization doesn’t scale very well.

Comments closed

Column Exclusion and Rename in Snowflake

Kevin Wilkie plays duck-duck-goose with columns:

With Snowflake, we could do many different things that we’re not used to seeing with a SELECT statement. We’re all used to seeing this – SELECT * and it shows all kinds of columns.

With Snowflake, we can tell Snowflake NOT to show certain columns by using the EXCLUDE operator.

Read on to see how it works and specific requirements around operation. In addition, Kevin shows a way to perform aliasing.

Comments closed

Defining an Analytics Engineer

Ust Oldfield defines a term:

Analytics Engineering, along with Data Engineering and Report Engineering, is a specialised subset of skills that would previously be the preserve of a Business Intelligence (BI) Developer. The BI Developer was once a generalist data developer, whose overall responsibilities have been split out and shared among specialist developers as the prevalence of data across organisation has increased and the tools and technologies used to ingest, transform, and serve data have become more specialised and loosely integrated.

In the same way that Data Engineering borrowed and took inspiration from Software Engineering for applying repeatable and scalable patterns and techniques to the pipelines that ingest and cleanse data, as well as the rigorous testing of those pipelines, Analytics Engineering has borrowed and taken inspiration from Software Engineering too.

Click through for the specifics of what an Analytics Engineer does.

Comments closed

Redshift Query Editor v2

Anusha Challa, et al, announce a new version of a Redshift query editor:

Amazon Redshift is a fast, fully managed, petabyte-scale cloud data warehouse. You have the flexibility to choose from provisioned and serverless compute modes. You can start loading and querying large datasets conveniently in Amazon Redshift using Amazon Redshift Query Editor v2, a web-based SQL client application.

It’s worth a try if you’re a Redshift user, though I’d imagine that frequent Redshift users have already sorted out their IDEs of choice.

Comments closed

Storing Semi-Additive Facts as Timespans

Timo Zishiri gives a new spin to a common warehousing problem:

In these cases, the measure may be aggregated across dates by averaging over the number of periods, e.g., average daily inventory levels. Measures can also be aggregated across dates by taking the maximum/minimum for the time interval.

More specifically, this blog focuses on an alternative approach to providing end users with the ability to do point-in-time analysis, so-called trend analysis.

Click through to see how a timespan table would work.

Comments closed

Snapshot Fact Tables in a Data Warehouse

Alex Crampton explains how snapshot fact tables work in data warehousing:

The typical fact table measures activities and is known as a transaction fact table. They support a wide variety of analytic possibilities and can be used to capture detailed information about a particular process. Certain facts cannot be studied easily using this kind of design, if at all.

This blog will outline the characteristics of a transaction fact table vs those of a snapshot fact table, and when the need for a snapshot fact table arises.

Snapshot-based fact tables aren’t ideal for data load times (especially as the table gets large) but they are useful in specific circumstances, as Alex points out.

Comments closed

Search Optimization in Snowflake

Arun Sirpal doesn’t have time to create indexes:

I will use a clone of the table to compare it to when search optimisation is on. I will make sure no caching in on which could affect the test.
I activate the feature via:

ALTER TABLE data_staging ADD SEARCH OPTIMIZATION;

This takes time! If you run something like the below to confirm 100% completion. This is because there is a maintenance service that runs in the background responsible for creating and maintaining the search access path:

Click through to see what happens and the kinds of performance gains Arun realized.

Comments closed

Data Architecture Questions to Ask

James Serra does some thinking:

As an example, if I were asked what product to use to store data in the Azure cloud, I could come up with at least a dozen options, so I need to ask questions to reduce the choices to the best use case for the customers situation. This will avoid what I have seen many times – a company chooses a particular product and after their solution is built, they say the product is “terrible”, but they were using it for a use case that it was not designed for. But the customer was not aware of a better product for their use case because “they don’t know what they don’t know”. That is why you should work with an architect expert as one of your first order of business: the technology decisions at this early part of building a solution are vital to get correct, as finding out 6-months or one year later that you made the wrong choice and have to start over can lead to so much wasted time and money (and I have seen some shocking waste).

Read on for a slew of questions.

Comments closed

Creating a Snowflake Instance

Arun Sirpal sets up Snowflake:

Now let’s start the process of creating a snowflake account in the Azure Cloud. You can sign up for a free trial from here – https://signup.snowflake.com/ I am going to bypass this and go straight to the setup screens. (This is slightly different because as an org-admin I have the power to create accounts)

Select the cloud provider and edition you require; we have already discussed these options before. You know me, its going to be Azure but feel free to dive into AWS or GCP.

Read on for some step-by-step installation instructions.

Comments closed

The Basics of Slowly Changing Dimensions

Soheil Bakhshi explains what slowly changing dimensions are:

Another example is when a customer’s address changes in a sales system. Again, the customer is the same, but their address is now different. From a data warehousing standpoint, we have different options to deal with the data depending on the business requirements, leading us to different types of SDCs. It is crucial to note that the data changes in the transactional source systems (in our examples, the HR system or a sales system). We move and transform the data from the transactional systems via extract, transform, and load (ETL) processes and land it in a data warehouse, where the SCD concept kicks in. SCD is about how changes in the source systems reflect the data in the data warehouse. These kinds of changes in the source system do not happen very often hence the term slowly changing. Many SCD types have been developed over the years, which is out of the scope of this post, but for your reference, we cover the first three types as follows.

Click through for depictions of the first three types as well as implementation details and pains.

Comments closed