What’s New In Hadoop 3.0?

Shubham Sinha explains some of the changes coming to Hadoop:

Integrating EC with HDFS can maintain the same fault-tolerance with improved storage efficiency. As an example, a 3x replicated file with 6 blocks will consume 6*3 = 18 blocks of disk space. But with EC (6 data, 3 parity) deployment, it will only consume 9 blocks (6 data blocks + 3 parity blocks) of disk space. This only requires the storage overhead up to 50%.

Since Erasure coding requires additional overhead in the reconstruction of the data due to performing remote reads, thus it is generally used for storing less frequently accessed data. Before deploying Erasure code, users should consider all the overheads like storage, network and CPU overheads of erasure coding.

Now to support the Erasure Coding effectively in HDFS they made some changes in the architecture. Lets us take a look at the architectural changes.

There are some nice features coming to Hadoop version 3.

Related Posts

From pandas to Spark with koalas

Achilleus tries out Koalas: Python is widely used programming language when it comes to Data science workloads and Python has way too many different libraries to back this fact. Most of the data scientists are familiar with Python and pandas mostly. But the main issue with Pandas is it works great for small and medium […]

Read More

New Query Store Functionality in 2019

Erin Stellato is excited about SQL Server 2019 CTP 3.0: Friends, CTP 3.0 dropped today, and it includes some changes for Query Store in SQL Server 2019!  I am so excited!!  I’ve downloaded it and have WideWorldImporters installed and have a lot of testing planned, but if you’re impatient, guess what?  The documentation is already updated!  […]

Read More

Categories