Press "Enter" to skip to content

Category: Microsoft Fabric

Reading and Writing JSON Files in Microsoft Fabric

Tom Martens performs two of the three R’s:

JSON is a straightforward, text-based data-interchange format fully independent of any programming language. This simplicity is why JSON documents have become so widespread.

Often the result of querying a REST API is returned as a JSON document, but this is not the only use case why I consider it important to get familiar with reading from and writing to a JSON file. I think a JSON document is ideal for ingesting values into a Python script or any code, values that are helpful to control the logic and behavior of the script. But of course, it’s also easy to store the value of a variable inside the JSON document, where this value will be used when the script is running again.

Click through for some Python code to perform these operations. There are about a dozen other methods you can use as well, but this is one of the best.

Comments closed

Fabric and Databricks

A rare two-part compare and contrast!

First, Chen Hirsh directly contrasts Microsoft Fabric and Databricks:

Microsoft recently announced the general availability of Microsoft Fabric, which contains all (or most) cloud Data analytics services from Microsoft. This is a good opportunity to compare it with another popular data platform, which is also available in Azure (and other cloud services) – Databricks.

Before we start, I should note that Fabric is quite new, and it’s still hard to evaluate its performance and stability. Also, both products have many features, and I only try to discuss the main differences.

Then, Eugene Meidinger keys us in on the similarities:

One of the things that helps to understand Fabric is that it’s heavily influenced by Databricks. It’s built on delta lake, which is created and open sourced by Databricks 2019. You are encouraged to use a medallion architecture, which as far as I can tell, comes from Databricks.

You will be a lot less frustrated if you realize that much of what’s going on with Fabric is a blend of open source formats and protocols, but also is a combination of the idiosyncrasies of Databricks and then those of Microsoft. David Gomes has good post about data lake file formats, and it’s interesting to imagine the parallel universe where Fabric is built on Iceberg (which is also based on Parquet files) instead of delta lake. (Note, I found this post from this week’s issue of Brent Ozar’s Newsletter)

Comments closed

Fabric Data Pipeline for Blob Storage CSV into Azure SQL DB

Andy Leonard loads some data:

In November 2023, I shared how to start learning Microsoft Fabric in a post titled Start a Fabric Free Trial. In December 2023, I shared how to Create a Workspace in Fabric. In this post, I document one way to create a pipeline to load data from a CSV file stored in Azure Blob Storage to Azure SQL Database in your new Fabric workspace.

Click through for some key assumptions, as well as the process.

Comments closed

Wrapping up the Advent of Microsoft Fabric

Tomaz Kastrun gets to 25. Day 24 covers OneLake in Fabric:

OneLake comes automatically with every Microsoft Fabric tenant and represents a single, logical data lake. Its main features are its unification and one copy of data across the organization and multiple analytical engines.

And Day 25 provides some additional references Tomaz has found useful along the way:

To wrap up the series, let’s check the material available online, for you to continue learning, exploring and enjoying Microsoft Fabric.

All in all, this has been a really good series and well worth going through if you are learning Microsoft Fabric.

Comments closed

Fabric F2 Performance

Teo Lachev has started a new series. We begin with warehouse ETL:

As inspired by Amir Netz‘s encouragement to partners to test the Fabric F2 capacity performance, I got on a quest to test what it would do to ETL loads for Fabric Warehouse. I must admit that I was skeptical that a quarter of a core would take a warehouse off the ground, but as usual, life proved me wrong and “wrong” is a big understatement of what happened.

After provisioning a Fabric F2 capacity and a warehouse, I settled on the Retail Data Model for World Wide Importers sample star schema dataset consisting of five dimension tables and one fact table. In terms of performance, I was mostly interested in how long it would take for the ADF copy activity to insert all the data (50 million rows) in the fact table. Granted, it’s a limited test but enough to rule out the technology for real-life projects. Then, I compared the performance against Azure SQL Database Serverless running on up to 2 cores and provisioned by the free trial offer that Microsoft has on Azure. To exclude impact on data transfer between regions, both technologies were provisioned on East US 2 data region, which is the region where my Power BI tenant is hosted on.

Then we have report load time:

What a better way to spend a lazy holiday afternoon than to do more Fabric performance testing? In my previous post, I shared my results from a single-threaded ETL load test to gauge the F2 ingest performance and F2 did pretty well (or at least outperformed Azure SQL DB). Will F2 hold as parallelism increases? Throughput testing is especially important for report loads because parallel tasks can run within a report, such as visuals executing DAX queries in parallel, and across reports, such as when concurrent report requests overlap.

I’m legitimately surprised at the results. I expected F2 to be barely sufficient for testing purposes. Read both posts to see how it performs and some caveats around performance.

Comments closed

Continuing the Advent of Fabric

Tomaz Kastrun has me playing catch-up. First up, monitoring workspaces in Fabric:

The easy way to check, view and track your activities and execution and runs of notebooks, data pipelines, data factory executions, datasets refresh, and many others.

Next, third-party applications:

Apps are collections of dashboards and reports in one easy-to-find place. Go to Apps and click on “Get Apps”.

And finally, the admin portal:

Admin portal serves purpose for governing and setting the Microsoft Fabric, where you can make  tenant settings, also access the Microsoft 365 admin portal, and control how users interact with Microsoft Fabric.

To access the admin portal, not only you need a Fabric license but also admin rights with the following roles (in one of these roles; if you are not, you can only see Capacity setting in the admin portal):

  • Global administrator
  • Power Platform administrator
  • Fabric administrator

If you’re just getting started with Microsoft Fabric, you could do a lot worse than going through Tomaz’s series.

Comments closed

Notebook Concurrency in Microsoft Fabric

Ed Oldham takes us through a common problem:

If you are currently using Microsoft Fabric you will have some sort of capacity associated with your account. This will have a large impact on what you can run concurrently. If you are on a Fabric Trial, you will have access to a trial capacity and if you are paying you will be on a certain capacity tier based on how much you pay. The following diagram shows information about each level of capacity and the Trial. The Trial resembles F64 capacity but is apparently different in some important ways (More on that later).

Read on to learn more about capacity and what that means for concurrent notebooks and Spark jobs.

Comments closed

Power BI, Event Streaming, and Notebooks in Microsoft Fabric

Tomaz Kastrun continues a series on Microsoft Fabric. Day 18 has us looking at Power BI:

We have created a Power BI report directly from the datalake and today we will check how to do same with dashboard and paginated reports.

Day 19 covers event streaming:

In Fabric, you can create streaming semantic model and when selecting you will get the usual sources:

Day 20 shows how you can work with notebooks in Microsoft Fabric:

Notebooks have been around for a long time and people, community, and professionals have proven the usability, practicality, versioning and reliability of notebooks. Not to mention the clarity and hygiene. But opinions are also divided.

The purpose of this post today is to check for a couple of functionalities that might not be that straightforward when it comes to notebooks.

Comments closed

Making REST API Calls against Microsoft Fabric

Sandeep Pawar digs into the REST API:

Accessing Fabric REST endpoints in Fabric notebooks was already easy but it became easier and straightforward with semantic-link version 0.4.0. You can use the FabricRestClient class from sempy to set up a REST client and call the APIs. Authentication is automatically managed for you.

Click through to see how it works, as well as some warnings or things to keep in mind along the way.

Comments closed

Looping through Lakehouses in Microsoft Fabric Spark Jobs

Dennes Torres builds a loop:

I have published videos and articles before about Lakehouse maintenance. In this article I want to address a missing point for a lot of Fabric administrators: How to do maintenance on multiple lakehouses that are located in different workspaces.

One of the videos I have published explains the maintenance of multiple lakehouses, but only addresses maintenance in a single workspace. Is it a good idea to keep multiple lakehouses in the same workspace? Probably not.

Click through for the process.

Comments closed