Press "Enter" to skip to content

Category: Notebooks

Creating Big Data Clusters with Azure Data Studio

Niels Berglund takes us through the creation of a Big Data Cluster by using Azure Data Studio to generate a notebook:

I wrote a blog post back in November 2018, about how to install and deploy SQL Server 2019 Big Data Cluster on Azure Kubernetes Service. Back then SQL Server 2019 Big Data Cluster was in private preview, (CTP 2.1 I believe), and you had to sign up, to get access to the “bits”. Well, you did not really get any “bits”; what you did get was access to Python deployment scripts.

Now, September 2019, the BDC is in public preview (you do not have to sign up), and it has reached Release Candidate (RC) status, RC 1. The install method has changed, or rather, in addition to installing via deployment scripts, you can now also install using Azure Data Studio deployment notebooks, and that is what this blog post is about.

Having gone through this myself, there’s quite a bit of reading involved in the setup, but they make the process pretty smooth. This also shows off one of the key benefits of notebooks: documentation and code together.

Comments closed

Develop BDC PySpark Jobs in Visual Studio Code

Jenny Jiang announces a new capability in Visual Studio Code:

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current linerun selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

This is rather useful for developers, though I greatly prefer the Azure Data Studio notebook interface.

Comments closed

IDEs and Cloudera Data Science Workbench

Bethann Noble walks us through some of the options available for IDEs operating against Cloudera Data Science Workbench:

Other coders on the team including ML and DevOps engineers often work in local IDEs such as PyCharm.  These applications run locally on the user’s computer and connect to CDSW remotely over SSH for code completion and execution.  They must be configured per user and are not associated at the project level in CDSW. The documentation provides sample instructions for the Professional Edition of PyCharm v2019.1.

They support both browser-based and local IDEs.

Comments closed

The SQL Notebook Experience, Featuring Powershell

Rob Sewell takes a break from book-writing and talks about using Powershell in SQL Notebooks:

Yes, it’s funny but also it carries a serious warning. Without understanding what it is doing, please don’t enable PowerShell to be run in a SQL Notebook that someone sent you in an email or you find on a GitHub. In the same way as you don’t open the word document attachment which will get a thousand million trillion pounddollars into your bank account or run code you copy from the internet on production without understanding what it does, this could be a very dangerous thing to do.

With that warning out of the way, there are loads of really useful and fantastic use cases for this. SQL Notebooks make great run-books or incident response recorders and PowerShell is an obvious tool for this. (If only we could save the PowerShell output in a SQL Notebook, this would be even better)

“It’s a bit hacky” is a generous statement, but it’s really cool that Rob figured out a way to do this. There is a Powershell kernel for Jupyter, but I’ve not had the best experience adding new kernels to Azure Data Studio (at least not F#’s kernel, which I tried).

Comments closed

Notebooks in Azure Databricks

Brad Llewellyn takes us through Azure Databricks notebooks:

Azure Databricks Notebooks support four programming languages, Python, Scala, SQL and R.  However, selecting a language in this drop-down doesn’t limit us to only using that language.  Instead, it makes the default language of the notebook.  Every code block in the notebook is run independently and we can manually specify the language for each code block.

Before we get to the actually coding, we need to attach our new notebook to an existing cluster.  As we said, Notebooks are nothing more than an interface for interactive code.  The processing is all done on the underlying cluster.

Read on to learn how Databricks uses the notebook metaphor heavily in how you interact with it.

Comments closed

July Azure Data Studio Update

Alan Yu announces some great things in the July update to Azure Data Studio:

One of the most requested features from customers around the world is enhanced execution plan support. Although we have basic query plan support in Azure Data Studio, it’s not as robust as similar functionality built into SQL Server Management Studio and what other vendors provide.

Today, we’re pleased to announce that one of our valued Microsoft partners, SentryOne is shipping their SentryOne Plan Explorer extension for Azure Data Studio. This is a free extension that provides enhanced plan diagrams for queries that are run in Azure Data Studio, with optimized layout algorithms and intuitive color-coding to help quickly identify the most expensive operators affecting query performance.

The other big thing I like is that notebooks have keyboard shortcuts. These were two of the things keeping me from using ADS as much as I’d wanted. Now I’m that much closer to full-on migration.

Comments closed

Using Notebooks with ElasticMapReduce

Vignesh Rajamani and Nikki Rouda show off ElasticMapReduce Notebooks:

One of the useful features of EMR Notebooks is the separation of the notebook environment from your underlying cluster infrastructure. The separation makes it easy for you to execute notebook code against transient clusters without worrying about deploying or configuring your notebook infrastructure every time you bring up a new cluster. You can create multiple serverless notebooks from the AWS Management Console for EMR and access the notebook UI without spending time setting up SSH access or configuring your browser for port-forwarding. Each notebook you create is launched instantly with its own Spark context. This capability enables you to attach multiple notebooks to a single shared cluster and submit parallel jobs without fear of job conflicts in a multi-tenant environment. This way you make efficient use of your clusters.

You can also connect EMR Notebooks to an EMR cluster as small as a one node. This gives you a budget-friendly sandbox environment to develop your Spark application.

Notebooks are everywhere. And for good reason.

Comments closed

Embedding Notebooks on a Website

Eduardo Pivaral shows how to embed the results of a Jupyter notebook created in Azure Data Studio on a website:

Notebooks are a functionality available in Azure Data Studio, that allows you to create and share documents that may contain text, code, images, and query results. These documents are helpful to be able to share database insights and create runbooks that you can share easily.

Are you new to notebooks? don’t know what are the uses for it? want to know how to create your first notebook? then you can get started in ADS notebooks checking my article for MSSQLTips.com here.

Once you have created your first notebooks and share them among your team, maybe you want to share it on your website or blog for public access.
even when you can share the file for download, you can also embed it on the HTML code.

Be sure to read the comments too. Rendering notebooks is…an imperfect operation.

Comments closed

Azure Data Studio May Release

Alan Yu announces the May release of Azure Data Studio:

Since its release two months ago, the community continues to love SQL Notebooks. This month, we had a laser-eyed focus on quality of life bug fixes instead of new features. These improvements include:

– Markdown rendering improvements, including better support for notes and tables
– Usability improvements to the toolbar
– Markdown links for trusted notebooks no longer requires Command/Ctrl + click and can be clicked directly
– Improvements in cleaning up Jupyter processes after closing notebooks and reducing errors when starting multiple notebooks concurrently
– Improvements to SQL Notebook connections to ensure errors don’t occur when running two notebooks against the same database
– Improvements to notebook auto-scrolling to the currently executing cell when clicking the run cells button from the toolbar
– General stability and performance improvements

And based on some of the GitHub comments, I’m going to really like the June release if those changes all make it in.

Comments closed

Automating Jupyter Notebooks

I have some early thoughts on automating Jupyter notebooks:

In the command above, I included the date of execution. That way, I can script this to run once a day, storing results in an HTML file in some directory. Then, I can compare results over time and see when issues popped up.

I can also parse the resultant HTML if need be. Note that this won’t be trivial: even though the output looks like a simple [1] "PROBLEM ALERT", there’s a more complicated HTML blob. 

At some point I’ll probably have follow-up thoughts on the topic. Probably.

Comments closed