Press "Enter" to skip to content

Category: Notebooks

Spark Streaming in a Databricks Notebook

Tomaz Kastrun shows off Spark Streaming in a Databricks notebook:

Spark Streaming is the process that can analyse not only batches of data but also streams of data in near real-time. It gives the powerful interactive and analytical applications across both hot and cold data (streaming data and historical data). Spark Streaming is a fault tolerance system, meaning due to lineage of operations, Spark will always remember where you stopped and in case of a worker error, another worker can always recreate all the data transformation from partitioned RDD (assuming that all the RDD transformations are deterministic).

Click through for the demo.

Comments closed

Linking between Notebooks in Azure Data Studio

Julie Koesmarno shows us the rules of linking notebooks in Azure Data Studio:

When writing a notebook, it can be very handy to be able to refer to a specific part to a notebook and allow the readers to jump to that part, i.e linking or anchoring. Using this technique, you can also create an index list or a table of contents or cross-referencing to parts of other notebooks too. Check out my demo notebook for this linking topic, from MsSQLGirl Github Repo.

Read on for those rules.

Comments closed

Using Notebooks to Load Data into the Databricks File System

Tomaz Kastrun is putting together an Advent of Azure Databricks:

Yesterday we started working towards data import and how to use drop zone to import data to DBFS. We have also created our first Notebook and this is where I would like to start today. With a light introduction to notebooks.

Read on for a depiction of notebooks, as well as an example which loads data into the Databricks File System (DBFS).

Comments closed

Kusto Queries in Azure Data Studio Notebooks

Julie Koesmarno shows off the Kusto Query Language magic in Azure Data Studio notebooks:

To do this, you’ll need to ensure that you have Kqlmagic installed. See Install and set up Kqlmagic in a notebook. Then in a notebook, you can load Kqlmagic with %reload_ext Kqlmagic in a code cell.

The next step is then in a new code cell, you can start connecting to a Log Analytics workspace. There are three ways to do so (roughly – as I’m also learning in this space too):

1. Using Azure Active Directory Device Login authentication.
2. Using Az CLI login
3. Using Client Secret

Read on for one example using Azure AD authentication.

Comments closed

Creating Jupyter Books in Azure Data Studio

Drew Skwiers-Koballa takes us through creating and deploying Jupyter Books:

The notebook experience in Azure Data Studio allows users to create and share documents containing live code, execution results, and narrative text. Potential usage includes data cleaning and transformation, statistical modeling, troubleshooting guides, data visualization, and machine learning. Jupyter books compile a collection of notebooks into a richer experience with more structure and a table of contents.  In Azure Data Studio we are able not only to use Jupyter books but also create and share them. Learn the basics of notebooks in Azure Data Studio from the documentation and read on to learn how to leverage a GitHub Action to publish and share remote Jupyter books.

Click through for the process of creating, opening, and distributing Jupyter Books.

Comments closed

Azure Data Studio November 2020 Release

Alan Yu announces the November 2020 release of Azure Data Studio:

Another feature request was to provide support for parameters in a notebook. Parameterization is the ability to execute the same notebook with different parameters.

With this release of Azure Data Studio, users will now be able to utilize Papermill’s ability to parameterize, execute, and store notebooks. By stating the parameters cell as the first code cell in your notebook, it ensures that the injected parameters in the outputted parameterized notebook will be placed directly after the original parameters cell. That way the parameterized notebook will utilize the newly injected parameters instead of the original parameters cell.

Users can utilize Papermill CLI as well as the Python API  to pass in a new set of parameters quickly and efficiently as shown below.

That does look interesting.

Comments closed