Getting Started With Spark

Kevin Feasel

2016-07-01

Spark

Denny Lee announces a new Spark intro guide:

We are proud to introduce the Getting Started with Apache Spark on Databricks Guide. This step-by-step guide illustrates how to leverage the Databricks’ platform to work with Apache Spark. Our just-in-time data platform simplifies common challenges when working with Spark: data integration, real-time experimentation, and robust deployment of production applications.

Databricks provides a simple, just-in-time data platform designed for data analysts, data scientists, and engineers. Using Databricks, this step-by-step guide helps you solve real-world Data Sciences and Data Engineering scenarios with Apache Spark. It will help you familiarize yourself with the Spark UI, learn how to create Spark jobs, load data and work with Datasets, get familiar with Spark’s DataFrames and Datasets API, run machine learning algorithms, and understand the basic concepts behind Spark Streaming.

If you are at all interested in distributed databases, Spark is a must-learn.

Related Posts

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark: When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit […]

Read More

Spark for .NET Developers

Ed Elliott has a long-form post covering spark-dotnet: The .NET driver is made up of two parts, and the first part is a Java JAR file which is loaded by Spark and then runs the .NET application. The second part of the .NET driver runs in the process and acts as a proxy between the […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031