The Basics Of Notebooks

I have a quick walkthrough of notebooks:

Remember chemistry class in high school or college?  You might remember having to keep a lab notebook for your experiments.  The purpose of this notebook was two-fold:  first, so you could remember what you did and why you did each step; second, so others could repeat what you did.  A well-done lab notebook has all you need to replicate an experiment, and independent replication is a huge part of what makes hard sciences “hard.”

Take that concept and apply it to statistical analysis of data, and you get the type of notebook I’m talking about here.  You start with a data set, perform cleansing activities, potentially prune elements (e.g., getting rid of rows with missing values), calculate descriptive statistics, and apply models to the data set.

I didn’t realize just how useful notebooks were until I started using them regularly.

Related Posts

Building An Image Recognizer With R

David Smith has a post showing how to build an image recognizer with R and Microsoft’s Cognitive Services Library: The process of training an image recognition system requires LOTS of images — millions and millions of them. The process involves feeding those images into a deep neural network, and during that process the network generates […]

Read More

Checkpointing Code For Reproduction

David Smith tells an interesting story about a reproducibility problem with data analysis: Timo Grossenbacher, data journalist with Swiss Radio and TV in Zurich, had a bit of a surprise when he attempted to recreate the results of one of the R Markdown scripts published by SRF Data to accompany their data journalism story about vested interests of Swiss […]

Read More


July 2016
« Jun Aug »