Software Engineering Practices for Notebooks

Rafi Kurlansik and Austin Ford explain how to get the most out of notebooks, using Databricks as an example:

Notebooks are a popular way to start working with data quickly without configuring a complicated environment. Notebook authors can quickly go from interactive analysis to sharing a collaborative workflow, mixing explanatory text with code. Often, notebooks that begin as exploration evolve into production artifacts. For example,

1. A report that runs regularly based on newer data and evolving business logic.

2. An ETL pipeline that needs to run on a regular schedule, or continuously.

3. A machine learning model that must be re-trained when new data arrives.

Perhaps surprisingly, many Databricks customers find that with small adjustments, notebooks can be packaged into production assets, and integrated with best practices such as code review, testing, modularity, continuous integration, and versioned deployment.

Read on for several tips and recommendations.

Separating Code from Presentation with Jupyter

John Mount disaggregates Jupyter notebook results:

As I switch back and forth between R and Python projects for various clients and partners, I got to thinking: “is there an easy way to separate code from presentations in Jupyter notebooks?”

The answer turns is yes. Jupyter itself exposes a rich application programming interface in Python. So it is very easy to organize Jupyter’s power into tools that give me a great data science and analysis workflow in Python.

Read on to see how.

SQL Tools Updates

Timi Oshin has updates on SSMS and Azure Data Studio:

Azure Data Studio 1.35 now supports easier keyboard navigation in notebooks without mouse clicking. This is done by hitting the Esc key and navigating between cell rows using the Up and Down arrow keys. To enter edit mode, hit the Enter key on the keyboard. The new Table Designer preview feature supports creating new tables and editing existing tables on a connected SQL Server instance. This is a highly requested product enhancement and enables more productive schema management with a modern, streamlined UX.

Haha! It only took several years but my hectoring finally pays off. Now for the full set of Jupyter keyboard shortcuts…

Working with Notebooks in Azure ML

I have started a new series:

In the prior series, Low-Code Machine Learning with Azure ML, we saw how to get started with Azure Machine Learning in a fairly pain-free way, especially for developers getting started with machine learning. In this series, I will assume that you already know all of those details and instead, we’re going to go full-code.

There are a few different ways in which we can go full-code with Azure ML. Today, we’re going to look at the easiest of those methods: using Jupyter notebooks within Azure ML Studio.

Read on for the first post in the series.

Databricks Notebook Discovery via Notebooks

Darin McBeth creates a meta-noterbook to keep track of notebooks:

Elsevier has been a customer of Databricks for about six years. There are now hundreds of users and tens of thousands of notebooks across their workspace. To some extent, Elsevier’s Databricks users have been a victim of their own success, as there are now too many notebooks to search through to find some earlier work.

The Databricks workspace does provide a keyword search, but we often find the need to define advanced search criteria, such as creator, last updated, programming language, notebook commands and results.

Interestingly, we managed to achieve this functionality using a 100% notebook-based solution with Databricks functionalities. As you will see, this makes it easy to set up in a customer’s Databricks environment.

Read on to see how.

Automating Notebook Execution with Powershell

Julie Koesmarno shows off an automation process for notebooks:

When I first think about automation, I generally think in the following way: in order to automate a script, we want to ensure that the script itself can be run via a command line interface (CLI) and with almost no user interaction (except for input and output parameters). Now, how do we apply this to Jupyter Notebooks so that we can automate SQL notebooks or PowerShell Notebooks?

The good news is that these SQL notebooks and PowerShell notebooks that we’ve created using Azure Data Studio, can be run on PowerShell CLI. If these notebooks can be run on PowerShell CLI, that means any automation systems or serverless architecture (Azure Automation combined with Azure Logic Apps as an example) should be able to run these notebooks also.

In this blog post, I’ll cover examples on using Invoke-SqlNotebook, using Invoke-ExecuteNotebook and putting it together with Azure Automation.

Click through to see the whole thing.

CI/CD with Databricks Notebooks and Azure DevOps

Michael Shtelma and Piotr Majer get us started on an MLOps journey:

This is the first part of a two-part series of blog posts that show how to configure and build end-to-end MLOps solutions on Databricks with notebooks and Repos API. This post presents a CI/CD framework on Databricks, which is based on Notebooks. The pipeline integrates with the Microsoft Azure DevOps ecosystem for the Continuous Integration (CI) part and Repos API for the Continuous Delivery (CD).In the second post, we’ll show how to leverage the Repos API functionality to implement a full CI/CD lifecycle on Databricks and extend it to the fully-blown MLOps solution.

Click through for the article and a link to code. You can also see the pipeline YAML (and Python code it calls) in the repo.

Interactive .NET Notebooks in Visual Studio Code

Deborah Melkin tries out .NET Interactive Notebooks in Visual Studio Code:

These days, we tend to think Azure Data Studio when we database developers talk about notebooks, specifically SQL Notebooks. But what Rob used for his demos are a new functionality within VS Code called .NET Interactive Notebooks. It was developed in combination with the Azure Data Studio team and it has support for SQL. But the cool thing that intrigued me was that a notebook could support multiple kernels, unlike Azure Data Studio. Knowing how much we love our SQL and PowerShell and this being a feature that many of us want to see in SQL Notebooks, I decided to try and set this up and poke around.

Click through for Deb’s experiences. And I’ll also point out that .NET Interactive Notebooks supports the best .NET language (and the one which most naturally fits the ethos of notebooks), F#.

