Press "Enter" to skip to content

Category: Power BI

Aggregations In Power BI

Teo Lachev takes us on a tour of aggregates in Power BI:

During the “Building a data model to support 1 trillion rows of data and more with Microsoft Power BI Premium” presentation at the Business Applications Summit, Microsoft discussed the technical details of how the forthcoming “Aggregations” feature can help you implement fast summarized queries on top of huge datasets. Following incremental refresh and composite models, aggregations are the next “pro” feature that debuts in Power BI and it aims to make it a more attractive option for deploying organizational semantic models. In this blog, I summarize my initial observations of this feature which should be available for preview in the September release of Power BI.

Aggregations are not a new concept to BI practitioners tackling large datasets. Ask a DBA what’s the next course of action after all tricks are exhausted to speed up massive queries and his answer would be summarized tables. That’s what aggregations are: predefined summaries of data, aimed to speed queries at the expense of more storage. BI pros would recall that Analysis Services Multidimensional (MD) has supported aggregations for a long time. Once you define an aggregation, MD maintains it automatically. When you process the partition, MD rebuilds the partition aggregations. An MD aggregation is tied to the source partition and it summarizes all measures in the partition. You might also recall that designing proper aggregations in MD isn’t easy and that the MD intra-dependencies could cause some grief, such as processing a dimension could invalidate the aggregations in the related partitions, requiring you to reprocess their indexes to restore aggregations. On the other hand, as it stands today, Analysis Services Tabular (Azure AS and SSAS Tabular) doesn’t support aggregations. Power BI takes the middle road. Like MD, Power BI would search for suitable aggregations to answer summarized queries, but it requires more work on your part to set them up.

Aggregates in Power BI aren’t as simple as they were in Analysis Services Multidimensional, but they do exist, and hopefully the Power BI team will improve upon them in future versions.

Comments closed

5 Reasons To Like Power BI Composite Models

Devin Knight has a video explaining why it’s worth getting excited over Power BI Composite Models:

Another special new capability that we gain with Composite Models is native support for many-to-many relationships. I see this as incredibly helpful to new Power BI users that are not familiar with strict requirements around relationships.

Traditionally, Power BI is only able to create relationships that are one-many. This means that one of the two tables in a relationship must have a column with unique values for the relationship to work. With this new feature the requirement that a table must have a column with unique values is no longer needed.

For those of you that have a data modeling background this may be a little different than what you traditionally envision a many-to-many relationship should be. Normally a many-to-many relationship includes a bridge table to bring the two tables together, but now with Power BI Composite Models the relationship can be directly defined between them.

Check it out.

Comments closed

A Recommendation Against Using TOTALYTD

Marco Russo recommends against using the TOTALYTD function in DAX:

You have been warned. I don’t like the code above. If you want to apply filters, use CALCULATE. Maybe the code written using CALCULATE is verbose and boring, but it’s easier to read. If you are looking for an enjoyable reading for your free time, you can choose a novel rather than playing with an obscure DAX syntax.

So, why writing a blog post about this? The reason is that TOTALYTD has the ability to accept a scalar value as an argument instead of (or after) the filter. This is the same additional optional parameter accepted by DATESYTD in case the fiscal year does not end on December 31st. The problem is that the third argument of TOTALYTD is a filter if you use a predicate or a table expression, and it’s a marker for end of year in case it’s a scalar value. Confusing enough? Ok, let me present some example.

Read the whole thing.

Comments closed

Improving Power Query Performance By Reducing Variables

Reza Rad shows us an extreme case of variable fatigue in Power Query:

This is a sample Power Query file, which in that I do a very simple transformation. The transformation is adding one to the existing number. However, in this sample, we are doing it over thousands of steps! one step at a time, we are adding thousands to a number. The main reason to do it this way is to show you what is the performance you get when you have too many variables (or let’s say steps) in Power Query.

Reza reduces the number of steps and turns a 15-minute operation into one which finishes in under a second.

Comments closed

Power BI August Release And SSAS Performance Improvements

Chris Webb points out something new in the Power BI August 2018 release:

While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:

AnalysisServices.Databases(
	"localhost", 
	[
		TypedMeasureColumns=true, 
		Implementation="2.0"
	]
)

…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.

Read on for Chris’s test and analysis of the resulting MDX output.

Comments closed

Dataflows In Power BI

James Serra gives us a preview of Power BI Dataflows:

In short, Dataflows integrates data lake and ETL technology directly into Power BI, so anyone with Power Query skills (yes – Power Query is now part of Power BI service and not just Power BI Desktop and is called Power Query online) can create, customize and manage data within their Power BI experience (think of it as self-service data prep).  Dataflows include a standard schema, called the Common Data Model (CDM), that contains the most common business entities across the major functions such as marketing, sales, service, finance, along with connectors that ingest data from the most common sources into these schemas.  This greatly simplifies modeling and integration challenges (it prevents multiple metadata/definition on the same data).  You can also extend the CDM by creating custom entities.  Lastly – Microsoft and their partners will be shipping out-of-the-box applications that run on Power BI that populate data in the Common Data Model and deliver insights through Power BI.

A dataflow is not just the data itself, but also logic on how the data is manipulated.  Dataflows belong to the Data Warehouse/Mart/Lake family.  Its main job is to aggregate, cleanse, transform, integrate and harmonize data from a large and growing set of supported on-premises and cloud-based data sources including Dynamics 365, Salesforce, Azure SQL Database, Excel, SharePoint.  Dataflows hold a collection of data-lake stored entities (i.e. tables) which are stored in internal Power BI Common Data Model compliant folders in Azure Data Lake Storage Gen2.

Also check out the comments for some clarification on why you’d want to use Dataflows rather than doing the work directly in the data lake.

Comments closed

Dual Storage Mode In Power BI

Teo Lachev takes us through the Dual storage mode now available in Power BI:

As the name implies, the dual storage mode is a hybrid between Import and DirectQuery. Like importing data, the dual storage mode caches the data in the table. However, it leaves it up to Power BI to determine the best way to query the table depending on the query context. Consider the following schema where all tables come from the same data source. Let’s assume the following configuration:

  • FactInternetSales is imported

  • FactResellerSales is DirectQuery

  • DimDate and DimProduct are Dual

Read on for more.

Comments closed

Nested Calculations In Power Query

Michael Humpherys shows how to use nested calculations in Power Query to make financial calculations easier:

A central problem in finance is answering the simple question: How much is this contract worth? For example, Bob might say he’ll give me $102 in a year, and I want to know how much I should pay him for that guaranteed money. If I figure out that the value of the contract is a $100, then I’m saying that the guaranteed $102 in a year is worth $100 today. This means I get a 2% interest on my $100 investment. This is called the one-year spot rate, and there are similar rates for all sort of different time frames. Taking 1/ (1+.02) gives me the discount rate and multiplying this by the $102 payment gets me to the $100 value of the contract.

The next step is that I may want to know how much $102 two years from now is worth next year. So instead of figuring out what it is worth today, I want to know what it will be worth in a year. To figure this out, I need something called the forward rate, which tells me the annual interest rate one year in the future, in this example.

With the forward rates, I can take a complex series of future payments and find the value of all of those payments today, but also the value at different points in the future. The complexity is that depending on when I want to value them to and the timing of the payment I need to use different sets of forward rates and that’s the application I’ll walk through below.

That is a novel use of the “table in a cell” technique.

Comments closed

Performance Test: Loading CSV Versus Loading Excel In Power Query

Chris Webb lays out a performance test which shows how quickly Power Query can read data from a CSV versus from an Excel spreadsheet:

The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower

  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.

  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

The results were remarkable; check them out.

Comments closed