Exploring The MNIST Dataset

David Robinson performs exploratory data analysis on the MNIST digit database:

The challenge is to classify a handwritten digit based on a 28-by-28 black and white image. MNIST is often credited as one of the first datasets to prove the effectiveness of neural networks.

In a series of posts, I’ll be training classifiers to recognize digits from images, while using data exploration and visualization to build our intuitions about why each method works or doesn’t. Like most of my posts I’ll be analyzing the data through tidy principles, particularly using the dplyr, tidyr and ggplot2 packages. In this first post we’ll focus on exploratory data analysis, to show how you can better understand your data before you start training classification algorithms or measuring accuracy. This will help when we’re choosing a model or transforming our features.

Read on for the analysis.

Related Posts

Using The Azure Data Science VM With GPUs

Jennifer Marsman has some tips and tricks around using the Azure Data Science Virtual Machine on an instance running with GPU support: To get GPU support, you need both hardware with GPUs in a datacenter, as well as the right software – namely, a virtual machine image that includes GPU drivers so you can use […]

Read More

Visualizing Model Input Effects

Ilknur Kaynar Kabul shows us how to use partial dependence plots and individual conditional expectation plots to view the specific effect of an input variable on a model: A partial dependence (PD) plot depicts the functional relationship between a small number of input variables and predictions. They show how the predictions partially depend on values […]

Read More


January 2018
« Dec Feb »