Category: Notebooks

.NET and Powershell 7 Notebooks

Published 2020-02-10 by Kevin Feasel

Rob Sewell forwards on some exciting news:

A notebook experience for PowerShell 7 that sounds amazing. This will enable a true cross-platform PowerShell Notebook experience which is lacking from the Python version as it uses Windows PowerShell on Windows and PowerShell Core on other OS’s
The first thing I asked was – Will this come to Azure Data Studio. I got an immediate response from Sydney Smith PowerShell Program Manager saying it is on the roadmap

Two notes of importance. First, these are kernels for Jupyter Notebooks and not Azure Data Studio or VS Code (yet). Second, Rob buried the lede on the most important language in there: F#. You can also read the full announcement from Maria Naggaga, to which Rob linked.

Comments closed

Updating the Powershell Kernel in Azure Data Studio Notebooks

Published 2020-02-03 by Kevin Feasel

Bob Pusateri has a two-parter on Powershell notebooks. First up is the problem:

PowerShell Notebooks are a great new feature in Azure Data Studio, first becoming available in the November 2019 release. Like SQL notebooks, PowerShell notebooks are based on Jupyter Notebooks format, which are interactive documents containing text and executable code blocks.
Having some working PowerShell code that I wanted to share along with explanations and examples, I created a PowerShell Notebook. The only problem was my functions would never initialize. Actually they would never stop initializing – I would run the cell they were defined in, and it would just keep running forever.

And then Bob has the solution:

It turns out I did not have the latest version of the PowerShell Kernel running on my machine. The latest version is currently 0.1.3, and I had 0.1.2. Upgrading appears to have solved this issue for me – yay!
This solution also raises the issue that there is no notification from Azure Data Studio that a PowerShell Kernel exists or is in need of updating. I (and probably others) will just believe that as long as Azure Data Studio is up to date, we’re good to go. So how does one update their PowerShell kernel? Well, it’s simple, but not intuitive.

Read on to see how.

Comments closed

SQL Diagnostic Book Update

Published 2020-02-03 by Kevin Feasel

Emanuele Meazzo has an update to the SQL Diagnostic Book:

Just a quick note, the SQL Diagnostic Book has been updated, here is the changelog:
– Updated FirstResponder’s Kit and Glen Berry’s Scripts
– Added SQL Assensment API Notebooks
– Added sp_whoisactive installation and basic documentation notebooks

Click through to see where you can get the book.

Comments closed

Databricks Automated Deployment and Testing

Published 2020-01-17 by Kevin Feasel

Li Yu, et al, explain how to use Databricks notebooks and MLflow to automate deployment and testing of Spark solutions:

Today many data science (DS) organizations are accelerating the agile analytics development process using Databricks notebooks. Fully leveraging the distributed computing power of Apache Spark™, these organizations are able to interact easily with data at multi-terabytes scale, from exploration to fast prototype and all the way to productionize sophisticated machine learning (ML) models. As fast iteration is achieved at high velocity, what has become increasingly evident is that it is non-trivial to manage the DS life cycle for efficiency, reproducibility, and high-quality. The challenge multiplies in large enterprises where data volume grows exponentially, the expectation of ROI is high on getting business value from data, and cross-functional collaborations are common.
In this blog, we introduce a joint work with Iterable that hardens the DS process with best practices from software development. This approach automates building, testing, and deployment of DS workflow from inside Databricks notebooks and integrates fully with MLflow and Databricks CLI. It enables proper version control and comprehensive logging of important metrics, including functional and integration tests, model performance metrics, and data lineage. All of these are achieved without the need to maintain a separate build server.

Read on to see how.

Comments closed

A Diagnostic Book for SQL Server

Published 2020-01-02 by Kevin Feasel

Emanuele Meazzo has a new years’s gift for us:

Welcome to 2020! I wanted to start this year by giving to all my fellow consultants another way to troubleshoot our beloved SQL Servers; I’ve already talked about diagnostic notebooks in the past, and now, since Azure Data Studio has implemented the feature, I wanted to group them into a Diagnostic Book.
As the name implies, a jupyter book is no other than a collection of notebooks (and markdown files) that groups everything in a coherent space, with an index and navigation options alike.

I think this sort of collection of notebooks (a, uh, note-book), if put together well, makes it easier to learn a new environment and understand key problems than a big Scripts.txt file or a folder full of scripts.

Comments closed

Jupyter Notebooks and Cosmos DB

Published 2019-12-18 by Kevin Feasel

Hasan Savran shows how we can use Jupyter notebooks with Cosmos DB:

After you enable the Notebook options, you are ready to analyze or visualize your data thanks to Python language and Python packages. Cosmos DB makes your life easy to write Python and install custom packages to use with your data. There are couple of great internal commands and wildcards you should know if you like to use Notebooks in Azure Cosmos DB. First one I want to introduce you is, %%sql command. This command lets you select data from your containers by using SQL API. You can select data and add it to your Python data frames. You need to define which database and container you want to use before you pass your query. Here is an example. In the following example, I want to use my database named Stackoverflow, and container named Posts. Then I pass the query.

These are internal notebooks, meaning no separate Jupyter server required. There’s a separate way of learning the Cosmos API from external notebooks.

Comments closed

Distributing Notebooks

Published 2019-12-12 by Kevin Feasel

Grant Fritchey wants to know where to buy notebooks and notebook accessories:

I’m myopically focused at the moment on Azure Data Studio, but there are a lot of other places and ways to create or consume notebooks. However, I’m going to keep my focus.
The issue I’m running into, is distributing the notebooks.

There are a lot of great comments. Before reading them, here’s my answer:

GitHub repos, like Grant mentions. They’re good, though I have the same feeling about a production notebook that I do about an SSIS package: notebooks are binaries (after a fashion). For pedagogical purposes, I’ll absolutely slap notebooks into GitHub, typically without data. But for a real data science project, those notebooks can get hefty when you store all of the data in them, and it’s really hard to diff the JSON to understand what changed.
Binder and Azure Notebooks are services which let you host notebooks remotely. Binder reads from a GitHub repo and spins up a virtual environment for you. Azure Notebooks lets you run notebooks (including F# notebooks) against free VMs in Azure, or you can use your own VM for more power. Azure Notebooks let you fork projects pretty easily. I haven’t used Google Colab but it looks pretty similar to Azure Notebooks.
When you start up Jupyter Notebooks, you’re really starting a server. You can have a server running in your environment with your team’s notebooks. I’d probably still drop them in source control as well.

Comments closed

Converting Databricks Notebooks to ipynb

Published 2019-12-09 by Kevin Feasel

Dave Wentzel shows how we can convert a Databricks notebook (in DBC format) to a normal Jupyter notebook (in ipynb format):

Databricks natively stores it’s notebook files by default as DBC files, a closed, binary format. A .dbc file has a nice benefit of being self-contained. One dbc file can consist of an entire folder of notebooks and supporting files. But other than that, dbc files are frankly obnoxious.

Read on to see how to convert between these two formats.

Comments closed

Querying SQL Server from Python

Published 2019-12-09 by Kevin Feasel

Hasan Savran builds an Azure Data Studio notebook to query SQL Server from Python:

SQL Kernel is the default language, to query database with Python change SQL to Python 3. Probably, you will see the following message if this is the first time you are trying this. You need to install Python packages to be able to run python scripts. I have Visual Studio installed on my machine and I already have Python, I taught I could to use it by clicking “Use existing Python installation”. I was wrong, I couldn’t. This option looks for local installation files and when I point to Visual Studio Python files, it throws error in the middle of the installation. So, I will ignore this option for now.

In ADS, I haven’t gotten “Use existing Python location” to work either, so Hasan’s not alone in that regard.

Comments closed

JupyterLab Integration for Databricks

Published 2019-12-05 by Kevin Feasel

Bernhard Walter announces an integration between JupyterLab and Databricks:

This blog post starts with a quick overview how using a remote Databricks cluster from your local JupyterLab would look like. It then provides an end to end example of working with JupyterLab Integration followed by explaining the differences to Databricks Connect. If you want to try it yourself, the last section explains the installation.

I like this a lot, as it fights back a bit against the balkanization of data science: it means I don’t need to keep one set of notebooks here and another set of notebooks there and a third set of notebooks somewhere else.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31