Hadoop 3.0 Ships

Alex Woodie reports that Hadoop 3.0 is officially out there, and looks at what’s forthcoming in 3.1 and 3.2:

As we told you about last week, Hadoop 3.0 brings two big new features that are compelling in their own right. That includes support for erasure coding, which should boost storage efficiency by 50% thanks to more efficient data replication; and YARN Federation, which should allow Hadoop clusters to scale up to 40,000 nodes.

The delivery of Hadoop 3.0 shows that open open source community is responding to demands of industry, said Doug Cutting, original co-creator of Apache Hadoop and the chief architect at Cloudera.

“It’s tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today’s release,” he said in a press release.  “With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems.

But some of the new features in Hadoop 3.0 weren’t designed to bring immediate rewards to users. Instead, they pave the way for the Apache Hadoop community to deliver more compelling features with versions 3.1 and versions 3.2, according to  Hortonworks director of engineering Vinod Kumar Vavilapalli, who’s also a committer on the Apache Hadoop project.

“Hadoop 3.0 is actually a building block, a foundation, for more exciting things to come in 3.1 and 3.2,” he said.

Click through to see some of those exciting things.

Related Posts

Deploying Cloudera Enterprise On Azure

Xavier Morera announces a new Cloudera course: You will start by learning the Microsoft Azure services required to deploy a secure, elastic, Cloudera Enterprise cluster. These core services include security, networking, virtual machine management, and storage, just to name a few. Then, you’ll learn best practices and patterns for cloud-based clusters, including tips and caveats for security […]

Read More

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS: As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031