Notebook Practices

Jonathan Whitmore has good practices for managing Jupyter notebooks:

Here’s an example of how we use git and GitHub. One beautiful new feature of Github is that they now render Jupyter Notebooks automatically in repositories.

When we do our analysis, we do internal reviews of our code and our data science output. We do this with a traditional pull-request approach. When issuing pull-requests, however, looking at the differences between updated .ipynb files, the updates are not rendered in a helpful way. One solution people tend to recommend is to commit the conversion to .py instead. This is great for seeing the differences in the input code (while jettisoning the output), and is useful for seeing the changes. However, when reviewing data science work, it is also incredibly important to see the output itself.

So far, I’ve treated notebooks more as presentation media and used tools like R Studio for tinkering.  This shifts my priors a bit.

Related Posts

Binder: Hosting Jupyter Notebooks

Julia Evans points out a really interesting service: Binder lets you easily host interactive Jupyter notebooks and let anyone on the internet use them interactively immediately! It uses JupyterHub under the hood. If you want to try it out, you can do that right now: Go to https://mybinder.org/v2/gh/jvns/pandas-cookbook/master (which will launch the github.com/jvns/pandas-cookbook repository) Wait for it to […]

Read More

Jupyter And Kubernetes

David Crook shows how to use Jupyter notebooks inside Kubernetes: We start with a 16.04 image, we run some upgrades, install python, upgrade pip, install our requirements and expose port 8888 (jupyter’s default port). Here is our requirements.txt file 1 2 3 4 5 6 7 8 9 numpy pandas scipy jupyter azure_common azure-storage scikit-learn nltk […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930