Notebook Practices

Jonathan Whitmore has good practices for managing Jupyter notebooks:

Here’s an example of how we use git and GitHub. One beautiful new feature of Github is that they now render Jupyter Notebooks automatically in repositories.

When we do our analysis, we do internal reviews of our code and our data science output. We do this with a traditional pull-request approach. When issuing pull-requests, however, looking at the differences between updated .ipynb files, the updates are not rendered in a helpful way. One solution people tend to recommend is to commit the conversion to .py instead. This is great for seeing the differences in the input code (while jettisoning the output), and is useful for seeing the changes. However, when reviewing data science work, it is also incredibly important to see the output itself.

So far, I’ve treated notebooks more as presentation media and used tools like R Studio for tinkering.  This shifts my priors a bit.

Related Posts

Creating Big Data Clusters with Azure Data Studio

Niels Berglund takes us through the creation of a Big Data Cluster by using Azure Data Studio to generate a notebook: I wrote a blog post back in November 2018, about how to install and deploy SQL Server 2019 Big Data Cluster on Azure Kubernetes Service. Back then SQL Server 2019 Big Data Cluster was in private preview, (CTP 2.1 I […]

Read More

Develop BDC PySpark Jobs in Visual Studio Code

Jenny Jiang announces a new capability in Visual Studio Code: With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current line, run selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930