Spark Overview

Jen Underwood provides an overview of the Apache Spark project:

Spark provides a comprehensive framework to manage big data processing with a variety of data set types including text and graph data. It can also handle batch pipelines and real-time streaming data. Using Spark libraries, you can create big data analytics apps in Java, Scala, Clojure, and popular R and Python languages.

Spark brings analytics pros an improved MapReduce type query capability with more performant data processing in memory or on disk. It can be used with datasets that are larger than the aggregate memory in a cluster. Spark also has savvy lazy evaluation of big data queries which helps with workflow optimization and reuse of intermediate results in memory. TheSpark API is easy to learn.

One of my taglines is, Spark is not the future of Hadoop; Spark is the present of Hadoop.  If you want to get into this space, learn how to work with Spark.

Related Posts

Security Improvements In Kafka And Confluent Platform

Vahid Fereydouny demonstrates a number of security improvements made to Apache Kafka 2.0 as well as Confluent Platform 5.0: Over the past several quarters, we have made major security enhancements to Confluent Platform, which have helped many of you safeguard your business-critical applications. With the latest release, we increased the robustness of our security feature […]

Read More

SparkSession Versus SparkContext

Abhishek Baranwal explains the differences between the SparkSession object and the SparkContext object when writing Spark code: Prior to spark 2.0, SparkContext was used as a channel to access all spark functionality. The spark driver program uses sparkContext to connect to the cluster through resource manager. SparkConf is required to create the spark context object, […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31