Neural Nets On Spark

Nisha Muktewar and Seth Hendrickson show how to use Deeplearning4j to build deep learning models on Hadoop and Spark:

Modern convolutional networks can have several hundred million parameters. One of the top-performing neural networks in the Large Scale Visual Recognition Challenge (also known as “ImageNet”), has 140 million parameters to train! These networks not only take a lot of compute and storage resources (even with a cluster of GPUs, they can take weeks to train), but also require a lot of data. With only 30000 images, it is not practical to train such a complex model on Caltech-256 as there are not enough examples to adequately learn so many parameters. Instead, it is better to employ a method called transfer learning, which involves taking a pre-trained model and repurposing it for other use cases. Transfer learning can also greatly reduce the computational burden and remove the need for large swaths of specialized compute resources like GPUs.

It is possible to repurpose these models because convolutional neural networks tend to learn very general features when trained on image datasets, and this type of feature learning is often useful on other image datasets. For example, a network trained on ImageNet is likely to have learned how to recognize shapes, facial features, patterns, text, and so on, which will no doubt be useful for the Caltech-256 dataset.

This is a longer post, but on an extremely interesting topic.

Related Posts

Anomaly Detection With Kafka Streams

Ajmal Karuthakantakath shows us an application which performs fairly simple anomaly detection using Kafka Streams: The problem is in the banking loan payment domain, where customers have taken a loan and they need to make monthly payments to repay the loan amount. Assume there are millions of customers in the system and all these customers need […]

Read More

Unintentional Data

Eric Hollingsworth describes data science as the cost of collecting data approaches zero: Thankfully not only have modern data analysis tools made data collection cheap and easy, they have made the process of exploratory data analysis cheaper and easier as well. Yet when we use these tools to explore data and look for anomalies or […]

Read More


June 2017
« May Jul »