Press "Enter" to skip to content

Category: Microsoft Fabric

Renaming Multiple Columns in a PySpark Notebook

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Comments closed

Metadata-Driven Pipelines in Microsoft Fabric

John Miner returns to the old ways:

What is a metadata driven pipeline? Wikipedia defines metadata as “data that provides information about other data”. As a developer, we can create a non parameterized pipeline and/or notebook to solve a business problem. However, if we have to solve the same problem a hundred times, the amount of code can get unwieldly. A better way to solve this problem is to store metadata in the delta lake. This data will drive how the Azure Data Factory and Spark Notebooks execute.

Read on to see how you can accomplish this task.

Comments closed

Row-Level Security and USERELATIONSHIP() with Inactive Relationships

Marco Russo and Alberto Ferrari have a public service announcement:

USERELATIONSHIP is a very common and helpful function, used whenever there are multiple relationships between tables and developers need to decide which relationship to use. However, in some scenarios, this common function raises an annoying error:

The UseRelationship() and CrossFilter() functions may not be used when querying ‘Sales’ because it is constrained by row-level security.

As with all the error messages, this requires some understanding and further explanation. Moreover, a workaround is straightforward to find. However, the workaround has some subtle restrictions that need to be well understood.

Read on to learn more.

Comments closed

Full and Incremental Loads in Microsoft Fabric

John Miner continues a series on data engineering in Microsoft Fabric:

In a data lake, we have a bronze quality zone that supposed to represent the raw data in a delta file format. This might include versions of the files for auditing. In the silver quality zone, we have a single version of truth. The data is de-duplicated and cleaned up. How can we achieve these goals using the Apache Spark engine in Microsoft Fabric?

Read on for John’s take on the answer. I’ve found that I have a fairly good answer for smaller datasets, though as the size of the data gets larger, the less I like answers for the raw layer.

Comments closed

Renaming a Column in Microsoft Fabric via Python Notebook

Gilbert Quevauvilliers performs a rename:

I thought it would be good to help others in terms of my learning journey when working with partner notebooks and Microsoft fabric.

In today’s blog post, I am going to show you how to rename a column. In my experience this came up because I had a column name which had a forward slash “/” in it which caused the loading of the data for the table to fail because this is a reserved character.

Read on for the code an example of how it works in action.

Comments closed

Environmental Deployment in Microsoft Fabric

Kevin Chant takes us through deployment pipelines in Microsoft Fabric:

One question that I get frequently asked is how many workspaces are required? In reality, the answer is that it depends.

However, if you want your solution to be flexible and loosely coupled I do recommend at the very least one Microsoft Fabric workspace per environment.

That’s also required if you’re using deployment pipelines, as each stage in the pipeline pushes to a unique workspace.

Comments closed

Notes on Data Engineering in Microsoft Fabric

John Miner shares some notes. Part 1 looks at getting started and tables, both managed and unmanaged:

The architectural diagram shows how information flows from a source system, into a delta lake house, transformed by programs, and used by end users. To get source data into the lake, we can use any of the three methods to retrieve the data as files: pipelines – traditional Azure Data Factory components, dataflows – wrangling data flows based on Power Query and shortcuts – the ability to link external storage to the lake. Once the data is in the lake, there are two types of programs that can transform the data files: spark notebooks and data flows.

Part 2 covers file and folder management:

In practice, I have seen an additional quality zone called raw be used to stage files in their native format before converting to a delta file format. Please note, the lake house uses either shortcuts or pipelines to get files into the lake. We will talk more about bronze, silver and gold zones when I cover full and incremental loading later in this article.

Read on for John’s thoughts.

Comments closed

Viewing DAX in Microsoft Fabric with SemPy

Kevin Chant talks about a recent issue:

Recently I have been helping others get up to speed with Microsoft Fabric. Which includes going through some Power BI topics.

One issue that came up was how to show them the DAX used for a measure within a Power BI report that had been published to Microsoft Fabric. To link working with measures in Power BI Desktop with working in Microsoft Fabric.

Kevin shows the normal way of doing this, as well as an alternative using the SemPy library.

Comments closed

Accessing the Purview Portal in Your Fabric Environment

Kevin Chant enables a feature:

In this post I want to cover accessing the new Microsoft Purview portal in your own Microsoft Fabric environment.

To clarify, I mean a Microsoft Fabric environment you have created for your own use. Like the one I covered in a previous post.

You can do this in a trial environment thanks to the new capability provided by Microsoft last year to infuse Microsoft Fabric items into Microsoft Purview. Which Microsoft covered in a blog post about Microsoft Fabric items in Microsoft Purview.

Read on to see how.

Comments closed

Automating Microsoft Fabric Capacity Scaling via Logic App

Soheil Bakhshi does some scaling:

In a previous post I explained how to manage the capacity costs of a Fabric F capacity (under Pay-As-You-Go pricing model) using Logic Apps to Suspend and Resume it.

A customer who read my previous blog asked me “Can we use a similar method to scale up and down before and after specific workloads?”. This blog post is to answer exactly that.

This is pretty neat, though I wonder how long it takes and how much downtime it produces.

Comments closed