Notebooks – Page 4 – Curated SQL

The public preview of AI Skills in Microsoft Fabric was announced yesterday. AI Skills allows Fabric developers to create their own GenAI experience using data in the lakehouse. Unlike Copilot, which is an AI assistant, AI Skills lets users build a validated Q&A application that queries lakehouse data by converting natural language questions into T-SQL queries. It’s only available in paid F64+ SKUs. You can watch the below video for Copilot, AI Skills and Gen AI experiences in Fabric:

Read on for more details on how it works.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Published 2024-07-26 by Kevin Feasel

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Getting the Top N Results in a PySpark Notebook

Published 2024-06-06 by Kevin Feasel

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Comments closed

Authenticating to Fabric APIs via Sempy and Service Principals

Published 2024-05-15 by Kevin Feasel

Gilbert Quevauvilliers links everything together:

I have been doing a fair amount of work lately with Fabric Notebooks.

I am always conscious to ensure that when I am authenticating using a Service Principal, I can make sure it is as secure as possible. To do this I have found that I can use the Azure Key Vault and Azure identity to successfully authenticate.

Read on for some of the advantages of using Azure Key Vault for this sort of credential management, as well as how to get it all working.

Comments closed

Running SemPy from Microsoft Fabric Notebooks

Published 2024-03-11 by Kevin Feasel

Gilbert Quevauvilliers sets up an environment:

Below is where I had an error when trying to run a notebook via a data pipeline and it failed.

Below are the steps to get this working.

This was the error message I got as shown below.

Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – MagicUsageError, Error value – %pip magic command is disabled.’ :

Read on to see how you can fix this error and get SemPy running.

Comments closed

Renaming Multiple Columns in a PySpark Notebook

Published 2024-02-29 by Kevin Feasel

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Comments closed

Renaming a Column in Microsoft Fabric via Python Notebook

Published 2024-02-21 by Kevin Feasel

Gilbert Quevauvilliers performs a rename:

I thought it would be good to help others in terms of my learning journey when working with partner notebooks and Microsoft fabric.

In today’s blog post, I am going to show you how to rename a column. In my experience this came up because I had a column name which had a forward slash “/” in it which caused the loading of the data for the table to fail because this is a reserved character.

Read on for the code an example of how it works in action.

Comments closed

Running Jupyter Notebooks inside a For Loop

Published 2024-02-09 by Kevin Feasel

John Mount runs a series of notebooks:

In my opinion, a number of “moving data science to production” problems are solved if one could just use a Jupyter notebook inside a for-loop. The wvpy package supplies the tools put Jupyter notebooks inside for-loops using nbconvert capabilities, which I will demonstrate here.

Click through to see how it all works.

Comments closed

Parallelizing Notebook Runs in Microsoft Fabric via Python

Published 2024-01-30 by Kevin Feasel

Sandeep Pawar kicks off multiple notebooks at once:

The notebook class in mssparkutils has two methods to run notebooks – run and runMultiple . run allows you to trigger a notebook run for one single notebook. Mim wrote a nice blog to show how to use it and its usefulness.

runMultiple , on the other hand, allows you to create a Direct Acyclic Graph (DAG) of notebooks to execute notebooks in parallel and in specified order, similar to a pipeline run except in a notebook.

Read on to learn more about the advantages of this latter approach as well as how you can do it.

Comments closed

Notebooks versus Dataflow Gen2 in Microsoft Fabric

Published 2024-01-25 by Kevin Feasel

Gilbert Quevauvilliers takes us through a comparison:

In this blog post I am going to compare Dataflow Gen2 vs Notebook in terms of how much it costs for the workload. I will also compare usability as currently the dataflow gen2 has got a lot of built in features which makes it easier to use.

The goal of this blog post is to understand which in my opinion is cheaper and easier to use, which will then be the focus for future blog posts with regards to what I’ve learned along the way, which will hopefully assist you too.

To compare between the two workloads, I am going to be using the same source file as well as do the same transformations which will result in the same result.

Read on for a surprising difference in cost.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Notebooks

Using AI Skills as Cell Magics in Microsoft Fabric Notebooks

Defining the Default Lakehouse for a Fabric Notebook

Getting the Top N Results in a PySpark Notebook

Authenticating to Fabric APIs via Sempy and Service Principals

Running SemPy from Microsoft Fabric Notebooks

Renaming Multiple Columns in a PySpark Notebook

Renaming a Column in Microsoft Fabric via Python Notebook

Running Jupyter Notebooks inside a For Loop

Parallelizing Notebook Runs in Microsoft Fabric via Python

Notebooks versus Dataflow Gen2 in Microsoft Fabric