Press "Enter" to skip to content

Category: Notebooks

Querying a Fabric SQL Endpoint via Notebook and T-SQL

Sandeep Pawar talks about a Spark connector:

I am not sharing anything new. The spark data warehouse connector has been available for a couple months now. It had some bugs, but it seems to be stable now. This connector allows you to query the lakehouse or warehouse endpoint in the Fabric notebook using spark. You can read the documentation for details but below is a quick pattern that you may find handy.

Despite it not being anything new, it is still interesting to see the use case of writing T-SQL instead of Spark SQL.

Comments closed

mssparkutils now notebookutils and Validating DAGs in Fabric

Sandeep Pawar gives us two quick hits:

First, if you haven’t noticed mssparkutils has been officially renamed to notebookutils. Check out the official documentation for details. Be sure to use/update your notebooks to notebookutils.

Read on for a pair of notes around this name change, as well as some capabilities to validate DAGs when using runMultiple to orchestrate multiple notebook executions.

Comments closed

Databricks Notebook Package Installation and Variables

Chen Hirsh diagnoses a problem:

A friend called to ask for my help with a weird issue. In a Databricks notebook using Python, he declares and assigns a variable in the first cell. Something like that:

my_var = 1

He then runs the rest of the notebook, and somewhere along the way, tries to use this variable, and gets this message:

NameError: name 'my_var' is not defined

Going back to cell 1, and checking the value of my_var, he gets the same error.

Read on for the root cause of the issue, as well as a pair of helpful tips from Chen.

Comments closed

Downloading Power Automate Scanner API Data into a Notebook

Gilbert Quevauvilliers creates a notebook:

I was recently working with a customer where they had more then 100 app workspaces and I was running into some challenges when using the Scanner API in Power Automate.

I then discovered this blog post where they detailed how to download the Scanner API data (DataXbi – admin-scan.py), it was not quite in the format that I needed, so below is my modified code.

The reason that I am downloading the Scanner API into a JSON file is that I find it easier to extract the data that I need using Power BI Desktop.

Click through for the code and how it all works.

Comments closed

Using AI Skills as Cell Magics in Microsoft Fabric Notebooks

Sandeep Pawar takes a look at a new preview capability:

The public preview of AI Skills in Microsoft Fabric was announced yesterday. AI Skills allows Fabric developers to create their own GenAI experience using data in the lakehouse. Unlike Copilot, which is an AI assistant, AI Skills lets users build a validated Q&A application that queries lakehouse data by converting natural language questions into T-SQL queries. It’s only available in paid F64+ SKUs. You can watch the below video for Copilot, AI Skills and Gen AI experiences in Fabric:

Read on for more details on how it works.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Getting the Top N Results in a PySpark Notebook

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Comments closed

Authenticating to Fabric APIs via Sempy and Service Principals

Gilbert Quevauvilliers links everything together:

I have been doing a fair amount of work lately with Fabric Notebooks.

I am always conscious to ensure that when I am authenticating using a Service Principal, I can make sure it is as secure as possible. To do this I have found that I can use the Azure Key Vault and Azure identity to successfully authenticate.

Read on for some of the advantages of using Azure Key Vault for this sort of credential management, as well as how to get it all working.

Comments closed

Running SemPy from Microsoft Fabric Notebooks

Gilbert Quevauvilliers sets up an environment:

Below is where I had an error when trying to run a notebook via a data pipeline and it failed.

Below are the steps to get this working.

This was the error message I got as shown below.

Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – MagicUsageError, Error value – %pip magic command is disabled.’ :

Read on to see how you can fix this error and get SemPy running.

Comments closed

Renaming Multiple Columns in a PySpark Notebook

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Comments closed