Press "Enter" to skip to content

The Spark Starter Guide

Landon Robinson has some good news for us:

If you visit hadoopsters.com/spark or thesparkguide.com, you’ll see something new and exciting from us. It’s official: we’ve written and are publishing a comprehensive guide to Apache Spark.

This guide will be completely online and completely free. A book’s worth of content, containing exercises in Python and Scala to teach you Spark, at your fingertips. Again, free.

Landon has posted chapter 1, section 1 already:

This section introduces the concept of data pipelines – how data is processed from one form into another. It’s also the generic term used to describe how data moves from one location or form, and is consumed, altered, transformed, and delivered to another location or form.

You’ll be introduced to Spark functions like joinfilter, and aggregate to process data in a variety of forms. You’ll learn it all through interactive Spark exercises in Scala and Python.

This is very early in the process but I’m excited.