Press "Enter" to skip to content

Category: Notebooks

Survival Analysis Notebooks

Dan Morris, et al, walk us through a survival analysis scenario:

In contrast to other methods that may seem similar on the surface, such as linear regression, survival analysis takes censoring into account. Censoring occurs when the start and/or end of a measured value is unknown. For example, suppose our historical data includes records for the two customers below. In the case of customer A, we know the precise duration of the subscription because the customer churned in December 2020. For customer B, we know that the contract started four months ago and is still active, but we do not know how much longer they will be a customer. This is an example of right censoring because we do not yet know the end date for the measured value. Right censoring is what we most commonly see with this form of analysis.

Click through for an intro as well as a half-dozen notebooks.

Leave a Comment

Using Spark Pools in Azure Synapse Analytics

Rahul Mehta shows how to create and use an Apache Spark pool in Azure Synapse Analytics:

In the last part of the Azure Synapse Analytics article series, we learned how to create a dedicated SQL pool. Azure Synapse support three different types of pools – on-demand SQL pool, dedicated SQL pool and Spark pool. Spark provides an in-memory distributed processing framework for big data analytics, which suits many big data analytics use-cases. Azure Synapse Analytics provides mechanisms to use SQL on-demand pool to query data as a service, SQL dedicated pool for data warehousing using distributed data processing engine, and Spark pool for analytics using in-memory big data processing engine. This article shows how to create a Spark pool in Azure Synapse Analytics and further how to process the data using it.

Click through for a demo on setup and a sample notebook to get started.

Comments closed

Spark Streaming in a Databricks Notebook

Tomaz Kastrun shows off Spark Streaming in a Databricks notebook:

Spark Streaming is the process that can analyse not only batches of data but also streams of data in near real-time. It gives the powerful interactive and analytical applications across both hot and cold data (streaming data and historical data). Spark Streaming is a fault tolerance system, meaning due to lineage of operations, Spark will always remember where you stopped and in case of a worker error, another worker can always recreate all the data transformation from partitioned RDD (assuming that all the RDD transformations are deterministic).

Click through for the demo.

Comments closed

Linking between Notebooks in Azure Data Studio

Julie Koesmarno shows us the rules of linking notebooks in Azure Data Studio:

When writing a notebook, it can be very handy to be able to refer to a specific part to a notebook and allow the readers to jump to that part, i.e linking or anchoring. Using this technique, you can also create an index list or a table of contents or cross-referencing to parts of other notebooks too. Check out my demo notebook for this linking topic, from MsSQLGirl Github Repo.

Read on for those rules.

Comments closed

Using Notebooks to Load Data into the Databricks File System

Tomaz Kastrun is putting together an Advent of Azure Databricks:

Yesterday we started working towards data import and how to use drop zone to import data to DBFS. We have also created our first Notebook and this is where I would like to start today. With a light introduction to notebooks.

Read on for a depiction of notebooks, as well as an example which loads data into the Databricks File System (DBFS).

Comments closed

Kusto Queries in Azure Data Studio Notebooks

Julie Koesmarno shows off the Kusto Query Language magic in Azure Data Studio notebooks:

To do this, you’ll need to ensure that you have Kqlmagic installed. See Install and set up Kqlmagic in a notebook. Then in a notebook, you can load Kqlmagic with %reload_ext Kqlmagic in a code cell.

The next step is then in a new code cell, you can start connecting to a Log Analytics workspace. There are three ways to do so (roughly – as I’m also learning in this space too):

1. Using Azure Active Directory Device Login authentication.
2. Using Az CLI login
3. Using Client Secret

Read on for one example using Azure AD authentication.

Comments closed

Creating Jupyter Books in Azure Data Studio

Drew Skwiers-Koballa takes us through creating and deploying Jupyter Books:

The notebook experience in Azure Data Studio allows users to create and share documents containing live code, execution results, and narrative text. Potential usage includes data cleaning and transformation, statistical modeling, troubleshooting guides, data visualization, and machine learning. Jupyter books compile a collection of notebooks into a richer experience with more structure and a table of contents.  In Azure Data Studio we are able not only to use Jupyter books but also create and share them. Learn the basics of notebooks in Azure Data Studio from the documentation and read on to learn how to leverage a GitHub Action to publish and share remote Jupyter books.

Click through for the process of creating, opening, and distributing Jupyter Books.

Comments closed