Getting Started With Spark

Kevin Feasel

2016-07-01

Spark

Denny Lee announces a new Spark intro guide:

We are proud to introduce the Getting Started with Apache Spark on Databricks Guide. This step-by-step guide illustrates how to leverage the Databricks’ platform to work with Apache Spark. Our just-in-time data platform simplifies common challenges when working with Spark: data integration, real-time experimentation, and robust deployment of production applications.

Databricks provides a simple, just-in-time data platform designed for data analysts, data scientists, and engineers. Using Databricks, this step-by-step guide helps you solve real-world Data Sciences and Data Engineering scenarios with Apache Spark. It will help you familiarize yourself with the Spark UI, learn how to create Spark jobs, load data and work with Datasets, get familiar with Spark’s DataFrames and Datasets API, run machine learning algorithms, and understand the basic concepts behind Spark Streaming.

If you are at all interested in distributed databases, Spark is a must-learn.

Related Posts

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS: As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended […]

Read More

Tuning Apache Spark Applications

Vidisha Gupta has a few tips for tuning Apache Spark programs: Data Serialization – Serialization plays an important role in increasing the performance of any application. Spark provides two serialization libraries – Java Serialization: By default, spark uses Java’s ObjectOutputStream framework which can work with any class that implements java.io.serializable. This serialization is flexible but slow and […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031