Hadoop 3

Kevin Feasel

2016-05-24

Hadoop

Alex Woodie covers some upcoming changes with Hadoop version 3:

Hadoop 3, as it currently stands (which is subject to change), won’t look significantly different from Hadoop 2, Ajisaka said. Made generally available in the fall of 2013, Hadoop 2 was a very big deal for the open source big data platform, as it introduced the YARN scheduler, which effectively decoupled the MapReduce processing framework from HDFS, and paved the way for other processing frameworks, such as Apache Spark, to process data on Hadoop simultaneously. That has been hugely successful for the entire Hadoop ecosystem.

It appears the list of new features in Hadoop 3 is slightly less ambitious than the Hadoop 2 undertaking. According to Ajisaka’s presentation, in addition to support for erasure coding and bug fixes, Hadoop 3 currently calls for new features like:

  • shell script rewrite;
  • task-level native optimization;
  • the capability to derive heap size or MapReduce memory automatically;
  • eliminating of old features;
  • and support for more than two NameNodes.

The big benefit to erasure coding is that you can potentially cut data usage requirements in half, so that can help in very large environments.  Alex also notes that the first non-beta version of Hadoop 3 is expected to release by the end of the year.

Related Posts

Avro Schemas In Kafka

Stephane Maarek explains the value of using Apache Avro as a schema structure for your Kafka topics: Avro has support for primitive types ( int, string, long, bytes, etc…), complex types (enum, arrays, unions, optionals), logical types (dates, timestamp-millis, decimal), and data record (name and namespace). All the types you’ll ever need. Avro has support for embedded documentation. Although documentation is optional, in my workflow I […]

Read More

When Spark Meets Hive

Anna Martin and Rosaria Silipo look at combining HiveQL and SparkQL: We set our goal here to investigate the age distribution of Maine residents, men and women, using SQL queries. But the question is… on Apache Hive or on Apache Spark? Well, why not both? We could use SparkSQL to extract men’s age distribution and […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031