Press "Enter" to skip to content

Category: Notebooks

SandDance in Polyglot Notebooks

Matt Eland continues a series on dotnet Polyglot Notebooks:

As I’ve been doing more and more dotnet development in notebooks with Polyglot Notebooks I’ve found myself wanting more options to create rich data visualizations inside of VS Code the same way I could use Plotly for data visualization using Python and Jupyter Notebooks.

Thankfully, Microsoft SandDance exists and fills some of that gap in terms of doing rich data visualization from dotnet code.

In this article I’ll talk more about what SandDance is, show you how you can use it inside of a Polyglot Notebook in VS Code, and show you a simple way you can use it without needing a Polyglot Notebook.

Most of my experience with SandDance was with Azure Data Studio, but it’s nice to see this capability in notebooks as well.

Comments closed

Variable Sharing in Polyglot Notebooks

Matt Eland performs a few swaps:

While Polyglot Notebooks certainly brings the dream of notebook development to dotnet, Polyglot is at its finest when you work with one language and then hand off data to the next language for additional processing.

In this article we’ll talk about sharing variables between kernels using Polyglot Notebooks and VS Code. We’ll explore the syntax and tooling that exists around these functionalities as well as the current limitation of sharing variables between kernels.

For simplicity, I’m going to avoid getting into SQL and KQL kernels in this article, but I plan on delving further into each of these specialized kernels in future articles.

Click through for an example using the best .NET language, as well as C#. Do read the whole thing, especially if you think about passing around discriminated unions or method-reach objects.

Comments closed

The Two File Formats for Polyglot Notebooks

Matt Eland chooses a file extension:

I’ve been talking more and more about Polyglot Notebooks and as people try it out, they tend to ask me one common question: should I create a .dib file or an .ipynb file? What’s the difference anyways?

In this short article I’m going to explore the .dib and .ipynb file formats and explain the difference between the two while answering the question of which one you should choose when creating your own notebooks in Polyglot Notebooks.

Read on for Matt’s thoughts. My tendency is to create them in .ipynb format for additional tooling support and potential cross-product flexibility (assuming you have the right kernels installed on your Jupyter server), though Matt explains his preference for .dib.

Comments closed

An Introduction to Polyglot Notebooks

Matt Eland walks us through a sample:

Polyglot Notebooks allows you to create notebooks composed of multiple cells. These cells can be either markdown cells for documentation or code cells containing code in either C#, F#, PowerShell, SQL, KQL, HTML, JavaScript, or mermaid markdown for diagramming.

This allows you to mix together rich documentation supported by little pieces of code that progressively expand upon an idea, tell a story, or otherwise provide insight or information to you as a developer.

Click through for the example. The thing I hadn’t realized—because I don’t really do this in Jupyter—is that you can share variables between languages. That’s a fairly useful feature when you want to do most of your work in one language but just happen to need a library using a different language.

Comments closed

Working with .NET Polyglot Notebooks

Matt Eland installs Polyglot notebooks in VSCode:

Polyglot Notebooks is a powerful new interactive notebook technology that lets you run experiments in your editor, mix together code and rich documentation, and write code in a variety of languages to accomplish your tasks.

This article will guide you through the setup process to get Polyglot Notebooks running on your machine so you can do local data science and analytics notebooks using dotnet.

In case you’re curious, my installation has the ability to create notebooks in F#, C#, HTML, JavaScript, KQL, Markdown, Mermaid, Powershell, and SQL. I’m not positive how many of those come with the extension itself and how many are additional kernels I got from installing other extensions.

Comments closed

Working with Remote Jupyter Books in Azure Data Studio

Steve Hughes reaches across the internet:

When working with Azure Data Studio and its support of Jupyter books, you will find there is an option for remote Jupyter books. As shown in the image below, you can open that Jupyter book and follow through the dialogue for a couple of Microsoft books that are readily available.

Click through to see how this option differs from standard Jupyter books (which are themselves different from Jupyter notebooks) and how you can create one.

Comments closed

Unit Testing Spark Notebooks in Synapse

Arun Sethia grabs the oscilloscope:

In this blog post, we will cover how to test and create unit test cases for Spark jobs developed using Synapse Notebook. This is an extension of my previous blog, Synapse – Choosing Between Spark Notebook vs Spark Job Definition, where we discussed selecting between Spark Notebook and Spark Job Definition. Unit testing is an automated approach that developers use to test individual self-contained code units. By verifying code behavior early, it helps to streamline coding practices for larger systems.

Arun covers three major use cases: when your code is in an external library, when it is in a separate notebook, and when it is in the same notebook.

Comments closed

Orchestrating Synapse Notebooks and Spark Jobs from ADF

Abhishek Narain has an announcement:

Today, we are introducing support for orchestrating Synapse notebooks and Synapse spark job definitions (SJD) natively from Azure Data Factory pipelines. It immensely helps customers who have invested in ADF and Synapse Spark without requiring to switch to Synapse Pipelines for orchestrating Synapse Notebooks and SJD. 

NOTESynapse notebook and SJD activities were only available in Synapse Pipelines previously. 

If you’re familiar with Synapse Pipelines, the equivalent ADF operations are extremely similar, as you’d probably expect.

Comments closed

Parallel Loading in Spark Notebooks

Dustin Vannoy answers some questions:

I received many questions on my tutorial Ingest tables in parallel with an Apache Spark notebook using multithreading. In this video and post I address some of the questions that I couldn’t just answer in the YouTube comments. Watch the video for more complete answers but here are quick responses with links to examples where appropriate.

Click through for the video and some text versions. Dustin includes examples for Synapse and Databricks.

Comments closed

Executing Multiple Notebooks in one Spark Pool with Genie

Shalu Ganotra Chadra, et al, explain what Synapse Genie is:

The Genie framework is a metadata driven utility written in Python. It is implemented using threading (ThreadPoolExecutor module) and directed acyclic graph (Networkx library). It consists of a wrapper notebook, that reads metadata of notebooks and executes them within a single Spark session. Each notebook is invoked on a thread with MSSparkutils.run() command based on the available resources in the Spark pool. The dependencies between notebooks are understood and tracked through a directed acyclic graph.

Read on for more information about how you can use it and what the setup process looks like.

Comments closed