Press "Enter" to skip to content

Author: Kevin Feasel

X Functions in DAX

Matt Allington explains what that “X” means in functions like MEDIANX and also builds out a problem to solve:

An X function is a class of functions in DAX that are also known as “iterators” (note, there are other iterating functions too, like FILTER, but I am only referring to the X functions here). It is a big topic on its own, and this article is not going to be the definitive guide to X functions. But I will give you a couple of insights.

I have learnt a lot about how to teach people DAX over the last 6 years, and my teaching methods have evolved over that time. I remember fondly speaking at the Microsoft Data Insights Summit with Will Thompson on the topic “DAX 50 – DAX for the rest of us“. Will said to me “don’t mention the word ‘iterator’ as it is too confusing.”. I didn’t agree with Will at the time, but his comment stuck with me. Over time I have changed the day way I teach DAX. These days I show people how to add a calculated column in a table (everyone can do that – its dead easy). Then I explain that an X function does exactly the same thing, it’s just that you can’t see the interim results materialised in front of their eyes.

Click through to see what Matt means and stay tuned for the next episode of the X Functions, where Matt reveals that there are aliens by use of bi-directional relationships.

Comments closed

Converting Hexidecimal to Decimal with Power BI

Soheil Bakhshi has a great function for us:

A while ago I wrote a blogpost on how to use Unicode characters in Power BI. In that blogpost I used a recursive Power Query function to convert Hex values to Dec values. A few weeks back one of my site visitors kindly shared his non-recursive version of Power Query function which beautifully does the job. A big shout out to Rocco Lupoi for sharing his code. So, I decided to share it with everyone so more people can leverage his nice Power Query function. I have touched his code a bit though, but it was more cosmetic change, so all the credits of post goes to Rocco. 

Click through for the details.

Comments closed

Keeping .NET Tools Up to Date

Max Trinidad has a public service announcement for us:

Have you installed any of the .NET Tools? Such as “.NET Interactive” and “PowerShell Global“, then you’ll need to remember, to update these tools manually.

These tools give you the ability to use create Jupyter Notebook using Python Kernel but also with C#, F#, and PowerShell 7 kernels.

Read on to see how to check if you need to update either of these.

Comments closed

Spark Director Reader in Hive

Anishek Agarwal, et al, announce a new reader for Hive Warehouse Connector:

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse Connector needs to be used. 

We are happy to announce Spark Direct Reader mode in Hive Warehouse Connector which can read hive transactional tables directly from the filesystem. This feature has been available from CDP-Public-Cloud-2.0 (7.2.0.0) and CDP-DC-7.1 (7.1.1.0) releases onwards.

Hive Warehouse Connector (HWC) was available to provide access to managed tables in hive from spark, however since this involved communication with LLAP there was an additional hop to get the data and process it in spark vs the ability of spark to directly read the data from FileSystem for External tables. This leads to performance degradation in accessing data from managed tables vs external tables. Additionally a lot of use cases for HWC were associated with ETL jobs where a super user was running these jobs to update data in multiple tables hence authorization was not a strong business need for this case. HWC Spark Direct Reader is an additional mode available in HWC which tries to address the above concerns. This article describes the usage of spark direct reader to consume hive transactional table data in a spark application. It also introduces the methods and APIs to read hive transactional tables into spark dataframes. Finally, it demonstrates the transaction handling and semantics while using this reader.

Click through to learn how it works and see it in action.

Comments closed

Postgres Change Data Capture into Kafka

Abhishek Gupta walks us through an example of change data capture to track events:

Change Data Capture (CDC) is a technique used to track row-level changes in database tables in response to create, update and delete operations. Different databases use different techniques to expose these change data events – for example, logical decoding in PostgreSQLMySQL binary log (binlog) etc. This is a powerful capability, but useful only if there is a way to tap into these event logs and make it available to other services which depend on that information.

Debezium does just that! It is a distributed platform that builds on top of Change Data Capture features available in different databases. It provides a set of Kafka Connect connectors which tap into row-level changes (using CDC) in database table(s) and convert them into event streams. These event streams are sent to Apache Kafka which is a scalable event streaming platform – a perfect fit! Once the change log events are in Kafka, they will be available to all the downstream applications.

Click through for the demo, using Azure components.

Comments closed

Storing SQL Server Database Files in Blob Storage

Tomaz Kastrun has a wacky idea:

Storing SQL Server database files in Azure blob storage is a great solution for all the databases that are often migrated between instances, servers, virtual machines, or would have been divided between instances. This scenario also has the positive aspect to it, since the ability to create snapshot backups to Azure is seamless.

Following the steps, we will create a Azure Blob storage, where MSSQL Server database files will reside with MSSQL Server running on-prem. Assuming, that you already have the Azure account (if not, you can get a free Azure account), let’s proceed by opening the Windows Terminal in PowerShell mode.

I’m impressed that it worked and could see it being an option for small demo databases, but I can’t imagine performance would be good enough for a production scenario.

Comments closed

Halloween Problem and Inserts

Jared Poche continues a dive into the Halloween Problem:

I would have expected us to scan the temp table, then have a LEFT JOIN to the base table. The Table Spool is the red flag that we have an issue with the plan, and is frequently seen with Halloween protections.

The index scan on the base table seems to be overkill since we’re joining on the primary key columns (the key lookup isn’t much of a concern). But we’re likely doing the scan because of the spool; it’s SQL Server’s way of getting all relevant records in one place at one time, breaking the normal flow of row mode operation, to make sure we don’t look up the same record multiple times.

Read on to see the execution plan as well as Jared’s fix.

Comments closed

Creating Power BI Measures via Visual Studio Code

Phil Seamark goes one step further with TOM:

My last blog introduced the idea of using Microsoft Visual Studio Code to work with Power BI Models. For this article, I build on that idea by showing how you can use a TOM based script to automatically generate measures in your model Power BI (or Azure Analysis Services) model.

For simplicity, the example in this blog will do the following:

– Connect to an instance of Power BI Desktop
– Iterate through every Table in the model
– Iterate through every Column in the “current” table from the outer loop
– If the Column is numeric and not hidden, create a simple [Sum of <column>] measure

Read on for demonstration code and a walkthrough of the process.

Comments closed

Loading Data from S3 into Power BI

Gilbert Quevauvilliers loves a challenge:

I really enjoy a good challenge, and with my customer they have all their data stored in AWS S3. Whilst there is no native connector, I thought there must be a way for me to get the data from AWS S3 into Power BI.

I did a bit of Googling and could not find any suitable solution. I also found and learnt that I could use AWS Athena to query the data living in S3. (I am definitely an expert of have a lot of knowledge in the AWS space. I am fortunate that I have other people who know AWS and were able to setup, configure and give me the details to connect to S3 via AWS Athena)

Below are the steps on how I got this working.

Why they don’t have a proper connector is a bit of a head-scratcher to me given the sheer amount of data stored in S3 and the sheer number of connectors in Power BI.

Comments closed

Adding a Last Updated Time to Power BI Reports

Ed Hansenberry adds useful information to a Power BI report:

It is often useful to tell your users when the report refreshed so they have some idea of how current, or stale, the data is. Unfortunately, you cannot just add a TODAY() formula to your report as that will change each time they open the report. This method will give you the refresh time stamp and compensate for Daylight Savings Time.

This frustrates me a bit—the time when the report was generated is really easy to do in Reporting Services and is certainly a good practice to follow when building reports, and yet there’s a multi-step process involving writing M code to do something which ought to be trivial.

Comments closed