Hadoop In The Cloud

Peter Coates talks about pros and cons to Hadoop in the cloud:

Hadoop was developed for deployment over Linux running on bare metal. Cloud deployment implies virtual machines, and for Hadoop it’s a huge difference.

As detailed in other articles (for instance, Your Cluster Is an Appliance or Understanding Hadoop Hardware Requirements), bare-metal deployments have an inherent advantage over virtual machine deployments. The biggest of these is that they can use direct attached storage, i.e., local disks.

Not every Hadoop workload is storage I/O bound, but most are, and even when Hadoop seems to be CPU bound, much of the CPU activity is often either directly in service of I/O, i.e., marshaling, unmarshaling, compression, etc., or in service of avoiding I/O, i.e., building in-memory tables for map-side joins.

Read the whole thing.

Related Posts

Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4: We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.– Simplifies integrated […]

Read More

Building an AKS Cluster

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service: Next, we will create a resource group by executing the following command:az group create –name nameOfMyresourceGroup –location eastus2 Once you execute the above command, you can go into the Azure portal and refresh your resource group pane […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728