Press "Enter" to skip to content

Category: Notebooks

A New Notebook Tool: Polynote

Jeremy Smith, et al, announce a new product:

We are pleased to announce the open-source launch of Polynote: a new, polyglot notebook with first-class Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, and SQL, as-you-type autocomplete, and more.

Polynote provides data scientists and machine learning researchers with a notebook environment that allows them the freedom to seamlessly integrate our JVM-based ML platform — which makes heavy use of Scala — with the Python ecosystem’s popular machine learning and visualization libraries. It has seen substantial adoption among Netflix’s personalization and recommendation teams, and it is now being integrated with the rest of our research platform.

There are some nice pieces to it, especially around language interop.

Comments closed

Financial Time Series Analysis in Databricks

Ricardo Portilla shares a demo of financial time series analysis in Databricks:

We’ve shown a merging technique above, so now let’s focus on a standard aggregation, namely Volume-Weighted Average Price (VWAP), which is the average price weighted by volume. This metric is an indicator of the trend and value of the security throughout the day.  The vwap function within our wrapper class (in the attached notebook) shows where the VWAP falls above or below the trading price of the security. In particular, we can now identify the window during which the VWAP (in orange) falls below the trade price, showing that the stock is overbought.

Click through for the article, as well as a notebook you can try out.

Comments closed

Automating Azure Data Studio Notebooks

Aaron Nelson has two separate ways of scheduling Azure Data Studio notebooks for us:

There are two new options for automating your SQL Notebooks with your SQL Servers. Earlier this month, the Insiders build of Azure Data Studio received the ability to add SQL Notebooks in SQL Agent. This past Friday (September 20th, 2019) a new version of the SqlServer PowerShell module was posted to the Gallery, with a new Invoke-SqlNotebook cmdlet.

Read on for demos of both.

Comments closed

Creating Big Data Clusters with Azure Data Studio

Niels Berglund takes us through the creation of a Big Data Cluster by using Azure Data Studio to generate a notebook:

I wrote a blog post back in November 2018, about how to install and deploy SQL Server 2019 Big Data Cluster on Azure Kubernetes Service. Back then SQL Server 2019 Big Data Cluster was in private preview, (CTP 2.1 I believe), and you had to sign up, to get access to the “bits”. Well, you did not really get any “bits”; what you did get was access to Python deployment scripts.

Now, September 2019, the BDC is in public preview (you do not have to sign up), and it has reached Release Candidate (RC) status, RC 1. The install method has changed, or rather, in addition to installing via deployment scripts, you can now also install using Azure Data Studio deployment notebooks, and that is what this blog post is about.

Having gone through this myself, there’s quite a bit of reading involved in the setup, but they make the process pretty smooth. This also shows off one of the key benefits of notebooks: documentation and code together.

Comments closed

Develop BDC PySpark Jobs in Visual Studio Code

Jenny Jiang announces a new capability in Visual Studio Code:

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current linerun selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

This is rather useful for developers, though I greatly prefer the Azure Data Studio notebook interface.

Comments closed

IDEs and Cloudera Data Science Workbench

Bethann Noble walks us through some of the options available for IDEs operating against Cloudera Data Science Workbench:

Other coders on the team including ML and DevOps engineers often work in local IDEs such as PyCharm.  These applications run locally on the user’s computer and connect to CDSW remotely over SSH for code completion and execution.  They must be configured per user and are not associated at the project level in CDSW. The documentation provides sample instructions for the Professional Edition of PyCharm v2019.1.

They support both browser-based and local IDEs.

Comments closed

The SQL Notebook Experience, Featuring Powershell

Rob Sewell takes a break from book-writing and talks about using Powershell in SQL Notebooks:

Yes, it’s funny but also it carries a serious warning. Without understanding what it is doing, please don’t enable PowerShell to be run in a SQL Notebook that someone sent you in an email or you find on a GitHub. In the same way as you don’t open the word document attachment which will get a thousand million trillion pounddollars into your bank account or run code you copy from the internet on production without understanding what it does, this could be a very dangerous thing to do.

With that warning out of the way, there are loads of really useful and fantastic use cases for this. SQL Notebooks make great run-books or incident response recorders and PowerShell is an obvious tool for this. (If only we could save the PowerShell output in a SQL Notebook, this would be even better)

“It’s a bit hacky” is a generous statement, but it’s really cool that Rob figured out a way to do this. There is a Powershell kernel for Jupyter, but I’ve not had the best experience adding new kernels to Azure Data Studio (at least not F#’s kernel, which I tried).

Comments closed

Notebooks in Azure Databricks

Brad Llewellyn takes us through Azure Databricks notebooks:

Azure Databricks Notebooks support four programming languages, Python, Scala, SQL and R.  However, selecting a language in this drop-down doesn’t limit us to only using that language.  Instead, it makes the default language of the notebook.  Every code block in the notebook is run independently and we can manually specify the language for each code block.

Before we get to the actually coding, we need to attach our new notebook to an existing cluster.  As we said, Notebooks are nothing more than an interface for interactive code.  The processing is all done on the underlying cluster.

Read on to learn how Databricks uses the notebook metaphor heavily in how you interact with it.

Comments closed

July Azure Data Studio Update

Alan Yu announces some great things in the July update to Azure Data Studio:

One of the most requested features from customers around the world is enhanced execution plan support. Although we have basic query plan support in Azure Data Studio, it’s not as robust as similar functionality built into SQL Server Management Studio and what other vendors provide.

Today, we’re pleased to announce that one of our valued Microsoft partners, SentryOne is shipping their SentryOne Plan Explorer extension for Azure Data Studio. This is a free extension that provides enhanced plan diagrams for queries that are run in Azure Data Studio, with optimized layout algorithms and intuitive color-coding to help quickly identify the most expensive operators affecting query performance.

The other big thing I like is that notebooks have keyboard shortcuts. These were two of the things keeping me from using ADS as much as I’d wanted. Now I’m that much closer to full-on migration.

Comments closed