Press "Enter" to skip to content

Category: Notebooks

Using Pester with .NET Powershell Notebooks

Rob Sewell has Powershell in notebooks, so of course Rob is going to write tests:

Using Pester to validate that an environment is as you expect it is a good resource for incident resolution, potentially enabling you to quickly establish an area to concentrate on for the issue. However, if you try to run Pester in a .NET Notebook you will receive an error

Click through for the reason why this error appears and a workaround until it’s fixed for real.

Comments closed

Installing .NET Notebooks for Powershell

Max Trinidad shows us how to install .NET Interactive on Linux:

In Windows, just takes a few steps to set it up. For Linux, it takes a few extra steps but still is quick enough to get you started.

For Windows, follow the instructions found at the .NET Interactive page in Github.

For Linux, for Ubuntu 18.04, follow the blog post “Ubuntu 18.04 Package Manager – Install .NET Core“.

Basically, in either operating systems, you install:

Install the .NET Core SDK
Install the ASP.NET Core runtime
Install the .NET Core runtime

Click through for the step-by-step instructions. Once you have it done, you get not only Powershell but also F# and C#.

Comments closed

.NET and Powershell 7 Notebooks

Rob Sewell forwards on some exciting news:

A notebook experience for PowerShell 7 that sounds amazing. This will enable a true cross-platform PowerShell Notebook experience which is lacking from the Python version as it uses Windows PowerShell on Windows and PowerShell Core on other OS’s

The first thing I asked was – Will this come to Azure Data Studio. I got an immediate response from Sydney Smith PowerShell Program Manager saying it is on the roadmap

Two notes of importance. First, these are kernels for Jupyter Notebooks and not Azure Data Studio or VS Code (yet). Second, Rob buried the lede on the most important language in there: F#. You can also read the full announcement from Maria Naggaga, to which Rob linked.

Comments closed

Updating the Powershell Kernel in Azure Data Studio Notebooks

Bob Pusateri has a two-parter on Powershell notebooks. First up is the problem:

PowerShell Notebooks are a great new feature in Azure Data Studio, first becoming available in the November 2019 release. Like SQL notebooks, PowerShell notebooks are based on Jupyter Notebooks format, which are interactive documents containing text and executable code blocks.

Having some working PowerShell code that I wanted to share along with explanations and examples, I created a PowerShell Notebook. The only problem was my functions would never initialize. Actually they would never stop initializing – I would run the cell they were defined in, and it would just keep running forever.

And then Bob has the solution:

It turns out I did not have the latest version of the PowerShell Kernel running on my machine. The latest version is currently 0.1.3, and I had 0.1.2. Upgrading appears to have solved this issue for me – yay!

This solution also raises the issue that there is no notification from Azure Data Studio that a PowerShell Kernel exists or is in need of updating. I (and probably others) will just believe that as long as Azure Data Studio is up to date, we’re good to go. So how does one update their PowerShell kernel? Well, it’s simple, but not intuitive.

Read on to see how.

Comments closed

Databricks Automated Deployment and Testing

Li Yu, et al, explain how to use Databricks notebooks and MLflow to automate deployment and testing of Spark solutions:

Today many data science (DS) organizations are accelerating the agile analytics development process using Databricks notebooks.  Fully leveraging the distributed computing power of Apache Spark™, these organizations are able to interact easily with data at multi-terabytes scale, from exploration to fast prototype and all the way to productionize sophisticated machine learning (ML) models.  As fast iteration is achieved at high velocity, what has become increasingly evident is that it is non-trivial to manage the DS life cycle for efficiency, reproducibility, and high-quality. The challenge multiplies in large enterprises where data volume grows exponentially, the expectation of ROI is high on getting business value from data, and cross-functional collaborations are common.

In this blog, we introduce a joint work with Iterable that hardens the DS process with best practices from software development.  This approach automates building, testing, and deployment of DS workflow from inside Databricks notebooks and integrates fully with MLflow and Databricks CLI. It enables proper version control and comprehensive logging of important metrics, including functional and integration tests, model performance metrics, and data lineage. All of these are achieved without the need to maintain a separate build server.

Read on to see how.

Comments closed

A Diagnostic Book for SQL Server

Emanuele Meazzo has a new years’s gift for us:

Welcome to 2020! I wanted to start this year by giving to all my fellow consultants another way to troubleshoot our beloved SQL Servers; I’ve already talked about diagnostic notebooks in the past, and now, since Azure Data Studio has implemented the feature, I wanted to group them into a Diagnostic Book.

As the name implies, a jupyter book is no other than a collection of notebooks (and markdown files) that groups everything in a coherent space, with an index and navigation options alike.

I think this sort of collection of notebooks (a, uh, note-book), if put together well, makes it easier to learn a new environment and understand key problems than a big Scripts.txt file or a folder full of scripts.

Comments closed

Jupyter Notebooks and Cosmos DB

Hasan Savran shows how we can use Jupyter notebooks with Cosmos DB:

After you enable the Notebook options, you are ready to analyze or visualize your data thanks to Python language and Python packages. Cosmos DB makes your life easy to write Python and install custom packages to use with your data. There are couple of great internal commands and wildcards you should know if you like to use Notebooks in Azure Cosmos DB. First one I want to introduce you is, %%sql command. This command lets you select data from your containers by using SQL API. You can select data and add it to your Python data frames. You need to define which database and container you want to use before you pass your query. Here is an example. In the following example, I want to use my database named Stackoverflow, and container named Posts. Then I pass the query.

These are internal notebooks, meaning no separate Jupyter server required. There’s a separate way of learning the Cosmos API from external notebooks.

Comments closed

Distributing Notebooks

Grant Fritchey wants to know where to buy notebooks and notebook accessories:

I’m myopically focused at the moment on Azure Data Studio, but there are a lot of other places and ways to create or consume notebooks. However, I’m going to keep my focus.

The issue I’m running into, is distributing the notebooks.

There are a lot of great comments. Before reading them, here’s my answer:

  • GitHub repos, like Grant mentions. They’re good, though I have the same feeling about a production notebook that I do about an SSIS package: notebooks are binaries (after a fashion). For pedagogical purposes, I’ll absolutely slap notebooks into GitHub, typically without data. But for a real data science project, those notebooks can get hefty when you store all of the data in them, and it’s really hard to diff the JSON to understand what changed.
  • Binder and Azure Notebooks are services which let you host notebooks remotely. Binder reads from a GitHub repo and spins up a virtual environment for you. Azure Notebooks lets you run notebooks (including F# notebooks) against free VMs in Azure, or you can use your own VM for more power. Azure Notebooks let you fork projects pretty easily. I haven’t used Google Colab but it looks pretty similar to Azure Notebooks.
  • When you start up Jupyter Notebooks, you’re really starting a server. You can have a server running in your environment with your team’s notebooks. I’d probably still drop them in source control as well.
Comments closed