Press "Enter" to skip to content

Category: Microsoft Fabric

Reducing Power BI Dataset Sizes with Semantic Link

Sandeep Pawar builds some really cool diagnostics:

Semantic Link v0.6 is out and it has many new exciting additions to its growing list of list_* methods. Highlighted are some of the new methods. Install the latest version and check it out.

Some of the existing methods such as list_columns() have an additional parameter extended which returns more column information such as column cardinality, size, encoding and many more column properties. This allows users to get detailed information about the dataset and the columns.

Click through to see how you can get this information not just for a single semantic model, but for all semantic models in a tenant.

Comments closed

Using Apache Spark in Microsoft Fabric

Ginger Grant gives us an overview of where we can use Apache Spark in Microsoft Fabric:

If you have used Spark in Azure Synapse, prepare to be pleasantly surprised with the compute experience in Microsoft Fabric as Spark compute starts a lot faster because the underlying technology has changed. The Data Engineering and Data Science Fabric experiences include a managed Spark compute, which like previous Spark compute charges you when it is in use. The difference is the nodes are reserved for you, rather than allocated when you start the compute which results in compute starting in 30 seconds or less versus the 4 minutes of waiting it takes for Azure Synapse compute to start.  If you have different capacity needs that a default managed Spark compute will not provide, you can always create a custom pool.  Custom pools are created in a specific workspace, so you will need Administrator permissions on the workspace to create them. You can choose to make the new pool your default pool as well, so it will be what starts in the workspace.

Read on for more of Ginger’s thoughts on the matter, including how you can use Copilot in Microsoft Fabric (if you pay for it) to help generate Spark code.

Comments closed

Microsoft Fabric and Semantic Models

Kurt Buhler has a choose-your-own-adventure story:

Semantic models are integral to Microsoft Fabric. They use and are used by many of the different workloads. In Fabric, there’s more items that can connect to and consume your model—such as semantic link in notebooks. Because of these new options and tools, your model is exposed to additional types of users who will use it in different ways. As such, it’s important that you make good models that you manage well throughout their entire lifecycle.

Read on for more information and three separate scenarios

Comments closed

Renaming Multiple Columns in a PySpark Notebook

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Comments closed

Metadata-Driven Pipelines in Microsoft Fabric

John Miner returns to the old ways:

What is a metadata driven pipeline? Wikipedia defines metadata as “data that provides information about other data”. As a developer, we can create a non parameterized pipeline and/or notebook to solve a business problem. However, if we have to solve the same problem a hundred times, the amount of code can get unwieldly. A better way to solve this problem is to store metadata in the delta lake. This data will drive how the Azure Data Factory and Spark Notebooks execute.

Read on to see how you can accomplish this task.

Comments closed

Row-Level Security and USERELATIONSHIP() with Inactive Relationships

Marco Russo and Alberto Ferrari have a public service announcement:

USERELATIONSHIP is a very common and helpful function, used whenever there are multiple relationships between tables and developers need to decide which relationship to use. However, in some scenarios, this common function raises an annoying error:

The UseRelationship() and CrossFilter() functions may not be used when querying ‘Sales’ because it is constrained by row-level security.

As with all the error messages, this requires some understanding and further explanation. Moreover, a workaround is straightforward to find. However, the workaround has some subtle restrictions that need to be well understood.

Read on to learn more.

Comments closed

Full and Incremental Loads in Microsoft Fabric

John Miner continues a series on data engineering in Microsoft Fabric:

In a data lake, we have a bronze quality zone that supposed to represent the raw data in a delta file format. This might include versions of the files for auditing. In the silver quality zone, we have a single version of truth. The data is de-duplicated and cleaned up. How can we achieve these goals using the Apache Spark engine in Microsoft Fabric?

Read on for John’s take on the answer. I’ve found that I have a fairly good answer for smaller datasets, though as the size of the data gets larger, the less I like answers for the raw layer.

Comments closed

Renaming a Column in Microsoft Fabric via Python Notebook

Gilbert Quevauvilliers performs a rename:

I thought it would be good to help others in terms of my learning journey when working with partner notebooks and Microsoft fabric.

In today’s blog post, I am going to show you how to rename a column. In my experience this came up because I had a column name which had a forward slash “/” in it which caused the loading of the data for the table to fail because this is a reserved character.

Read on for the code an example of how it works in action.

Comments closed

Environmental Deployment in Microsoft Fabric

Kevin Chant takes us through deployment pipelines in Microsoft Fabric:

One question that I get frequently asked is how many workspaces are required? In reality, the answer is that it depends.

However, if you want your solution to be flexible and loosely coupled I do recommend at the very least one Microsoft Fabric workspace per environment.

That’s also required if you’re using deployment pipelines, as each stage in the pipeline pushes to a unique workspace.

Comments closed