Press "Enter" to skip to content

Category: Notebooks

The Two File Formats for Polyglot Notebooks

Matt Eland chooses a file extension:

I’ve been talking more and more about Polyglot Notebooks and as people try it out, they tend to ask me one common question: should I create a .dib file or an .ipynb file? What’s the difference anyways?

In this short article I’m going to explore the .dib and .ipynb file formats and explain the difference between the two while answering the question of which one you should choose when creating your own notebooks in Polyglot Notebooks.

Read on for Matt’s thoughts. My tendency is to create them in .ipynb format for additional tooling support and potential cross-product flexibility (assuming you have the right kernels installed on your Jupyter server), though Matt explains his preference for .dib.

Comments closed

An Introduction to Polyglot Notebooks

Matt Eland walks us through a sample:

Polyglot Notebooks allows you to create notebooks composed of multiple cells. These cells can be either markdown cells for documentation or code cells containing code in either C#, F#, PowerShell, SQL, KQL, HTML, JavaScript, or mermaid markdown for diagramming.

This allows you to mix together rich documentation supported by little pieces of code that progressively expand upon an idea, tell a story, or otherwise provide insight or information to you as a developer.

Click through for the example. The thing I hadn’t realized—because I don’t really do this in Jupyter—is that you can share variables between languages. That’s a fairly useful feature when you want to do most of your work in one language but just happen to need a library using a different language.

Comments closed

Working with .NET Polyglot Notebooks

Matt Eland installs Polyglot notebooks in VSCode:

Polyglot Notebooks is a powerful new interactive notebook technology that lets you run experiments in your editor, mix together code and rich documentation, and write code in a variety of languages to accomplish your tasks.

This article will guide you through the setup process to get Polyglot Notebooks running on your machine so you can do local data science and analytics notebooks using dotnet.

In case you’re curious, my installation has the ability to create notebooks in F#, C#, HTML, JavaScript, KQL, Markdown, Mermaid, Powershell, and SQL. I’m not positive how many of those come with the extension itself and how many are additional kernels I got from installing other extensions.

Comments closed

Working with Remote Jupyter Books in Azure Data Studio

Steve Hughes reaches across the internet:

When working with Azure Data Studio and its support of Jupyter books, you will find there is an option for remote Jupyter books. As shown in the image below, you can open that Jupyter book and follow through the dialogue for a couple of Microsoft books that are readily available.

Click through to see how this option differs from standard Jupyter books (which are themselves different from Jupyter notebooks) and how you can create one.

Comments closed

Unit Testing Spark Notebooks in Synapse

Arun Sethia grabs the oscilloscope:

In this blog post, we will cover how to test and create unit test cases for Spark jobs developed using Synapse Notebook. This is an extension of my previous blog, Synapse – Choosing Between Spark Notebook vs Spark Job Definition, where we discussed selecting between Spark Notebook and Spark Job Definition. Unit testing is an automated approach that developers use to test individual self-contained code units. By verifying code behavior early, it helps to streamline coding practices for larger systems.

Arun covers three major use cases: when your code is in an external library, when it is in a separate notebook, and when it is in the same notebook.

Comments closed

Orchestrating Synapse Notebooks and Spark Jobs from ADF

Abhishek Narain has an announcement:

Today, we are introducing support for orchestrating Synapse notebooks and Synapse spark job definitions (SJD) natively from Azure Data Factory pipelines. It immensely helps customers who have invested in ADF and Synapse Spark without requiring to switch to Synapse Pipelines for orchestrating Synapse Notebooks and SJD. 

NOTESynapse notebook and SJD activities were only available in Synapse Pipelines previously. 

If you’re familiar with Synapse Pipelines, the equivalent ADF operations are extremely similar, as you’d probably expect.

Comments closed

Parallel Loading in Spark Notebooks

Dustin Vannoy answers some questions:

I received many questions on my tutorial Ingest tables in parallel with an Apache Spark notebook using multithreading. In this video and post I address some of the questions that I couldn’t just answer in the YouTube comments. Watch the video for more complete answers but here are quick responses with links to examples where appropriate.

Click through for the video and some text versions. Dustin includes examples for Synapse and Databricks.

Comments closed

Executing Multiple Notebooks in one Spark Pool with Genie

Shalu Ganotra Chadra, et al, explain what Synapse Genie is:

The Genie framework is a metadata driven utility written in Python. It is implemented using threading (ThreadPoolExecutor module) and directed acyclic graph (Networkx library). It consists of a wrapper notebook, that reads metadata of notebooks and executes them within a single Spark session. Each notebook is invoked on a thread with MSSparkutils.run() command based on the available resources in the Spark pool. The dependencies between notebooks are understood and tracked through a directed acyclic graph.

Read on for more information about how you can use it and what the setup process looks like.

Comments closed

Sharing Results between Notebooks with MSSparkUtils

Liliam Leme provides an answer to a common Synapse Spark pool question:

I’ve been reviewing customer questions centered around “Have I tried using MSSparkUtils to solve the problem?”

One of the questions asked was how to share results between notebooks. Every time you hit “run” in a notebook, it starts a new Spark cluster which means that each notebook would be using different sessions. Making it impossible to share results between executions of notebooks. MSSparkUtils offers a solution to handle this exact scenario. 

Read on to see what MSSparkUtils is and how it helps in this case.

Comments closed

Choosing between Synapse Spark Notebooks or Job Definitions

Arun Sethia and Arshad Ali explain when you might use a Spark notebook versus a job definition:

Synapse Spark Notebook is a web-based (HTTP/HTTPS) interactive interface to create files that contain live code, narrative text, and visualizes output with rich libraries for spark based applications. Data engineers can collaborate, schedule, run, and test their spark application code using Notebooks. Notebooks are a good place to validate ideas and do quick experiments to get insight into the data. You can integrate the Synapse Notebook into Synapse pipeline.

The Notebook allows you to combine programming code with markdown text and perform simple visualizations (using Synapse Notebook chart options and open-source libraries). In addition, running code will supply immediate feedback, output, and progress tracking within Notebook.

Click through for the comparison.

Comments closed