Press "Enter" to skip to content

Category: Notebooks

Jupyter Notebooks and Cosmos DB

Hasan Savran shows how we can use Jupyter notebooks with Cosmos DB:

After you enable the Notebook options, you are ready to analyze or visualize your data thanks to Python language and Python packages. Cosmos DB makes your life easy to write Python and install custom packages to use with your data. There are couple of great internal commands and wildcards you should know if you like to use Notebooks in Azure Cosmos DB. First one I want to introduce you is, %%sql command. This command lets you select data from your containers by using SQL API. You can select data and add it to your Python data frames. You need to define which database and container you want to use before you pass your query. Here is an example. In the following example, I want to use my database named Stackoverflow, and container named Posts. Then I pass the query.

These are internal notebooks, meaning no separate Jupyter server required. There’s a separate way of learning the Cosmos API from external notebooks.

Comments closed

Distributing Notebooks

Grant Fritchey wants to know where to buy notebooks and notebook accessories:

I’m myopically focused at the moment on Azure Data Studio, but there are a lot of other places and ways to create or consume notebooks. However, I’m going to keep my focus.

The issue I’m running into, is distributing the notebooks.

There are a lot of great comments. Before reading them, here’s my answer:

  • GitHub repos, like Grant mentions. They’re good, though I have the same feeling about a production notebook that I do about an SSIS package: notebooks are binaries (after a fashion). For pedagogical purposes, I’ll absolutely slap notebooks into GitHub, typically without data. But for a real data science project, those notebooks can get hefty when you store all of the data in them, and it’s really hard to diff the JSON to understand what changed.
  • Binder and Azure Notebooks are services which let you host notebooks remotely. Binder reads from a GitHub repo and spins up a virtual environment for you. Azure Notebooks lets you run notebooks (including F# notebooks) against free VMs in Azure, or you can use your own VM for more power. Azure Notebooks let you fork projects pretty easily. I haven’t used Google Colab but it looks pretty similar to Azure Notebooks.
  • When you start up Jupyter Notebooks, you’re really starting a server. You can have a server running in your environment with your team’s notebooks. I’d probably still drop them in source control as well.
Comments closed

Converting Databricks Notebooks to ipynb

Dave Wentzel shows how we can convert a Databricks notebook (in DBC format) to a normal Jupyter notebook (in ipynb format):

Databricks natively stores it’s notebook files by default as DBC files, a closed, binary format. A .dbc file has a nice benefit of being self-contained. One dbc file can consist of an entire folder of notebooks and supporting files. But other than that, dbc files are frankly obnoxious.

Read on to see how to convert between these two formats.

Comments closed

Querying SQL Server from Python

Hasan Savran builds an Azure Data Studio notebook to query SQL Server from Python:

SQL Kernel is the default language, to query database with Python change SQL to Python 3. Probably, you will see the following message if this is the first time you are trying this. You need to install Python packages to be able to run python scripts. I have Visual Studio installed on my machine and I already have Python, I taught I could to use it by clicking “Use existing Python installation”. I was wrong, I couldn’t. This option looks for local installation files and when I point to Visual Studio Python files, it throws error in the middle of the installation. So, I will ignore this option for now.

In ADS, I haven’t gotten “Use existing Python location” to work either, so Hasan’s not alone in that regard.

Comments closed

JupyterLab Integration for Databricks

Bernhard Walter announces an integration between JupyterLab and Databricks:

This blog post starts with a quick overview how using a remote Databricks cluster from your local JupyterLab would look like. It then provides an end to end example of working with JupyterLab Integration followed by explaining the differences to Databricks Connect. If you want to try it yourself, the last section explains the installation.

I like this a lot, as it fights back a bit against the balkanization of data science: it means I don’t need to keep one set of notebooks here and another set of notebooks there and a third set of notebooks somewhere else.

Comments closed

Structuring Databricks Notebooks

Paul Andrew has put together a basic structure for Databricks notebooks using titles, markdown, and widgets:

For me, one of the hardest parts of developing anything is when you need to pick up and rework code that has been created by someone else. That said, my preferred Notebook structure shown below is not about technical performance or anything complicated. This is simply for ease of sharing and understanding, as well as some initial documentation for work done.

In my example I created a Scala Notebook, but this could of course apply to any flavour.

This makes good use of markdown capabilities without being too heavy. I like it. The same general principles apply if you’re putting together Jupyter notebooks outside of Databricks.

Comments closed

Creating Azure Data Studio Notebooks Using Powershell

Rob Sewell inverts the “Use Azure Data Studio to create Powershell notebooks” mantra:

This module contains only 3 commands at present

* Convert-ADSPowerShellForMarkdown

This will create the markdown link for embedding PowerShell code in a Text Cell for a SQL Notebook as described in this blog post

* New-ADSWorkBookCell

This command will create a workbook text cell or a code cell for adding to the New-ADSWorkBook command

* New-ADSWorkBook

This will create a new SQL Notebook using the cell objects created by New-ADSWorkBookCell

Click through for an example.

Comments closed

Powershell Notebooks in Azure Data Studio

Aaron Nelson announces a new feature in Azure Data Studio:

In order to get all the nice intellisense and tab completion features of the PowerShell language inside your PowerShell Notebooks, be sure to install the PowerShell extension from the Azure Data Studio marketplace.

At this point, the biggest remaining language is R, though I’d love to see F# support as well (hey, Azure Notebooks offers F# support).

Comments closed

Fun with Markdown in Azure Data Studio

Dave Bland takes us through some of the formatting options available in Azure Data Studio notebooks:

When working in a Notebook you have two types of cells, text and code.  The focus of this post is how to format the text cell.  Of course text goes into this cell so that part is easy and of course the text can say anything you would like to say.  When we work with text in Word, there is a format tool bar that we can use to make it look like we want it.  The text cells do not have this toolbar.

You might be asking, without the format toolbar, does that mean we can’t format the text?  That answer is no….we can still format the text, we just need to do it slightly different.  Rather than use a toolbar, we need to use characters.

There’s a lot of power in Markdown.

Comments closed