Comparing Hadoop On EC2 To EMR

Mark Litwintschik looks at EC2 versus EMR in terms of performance, specifically targeting a solution at less than $3 per hour:

The $3.00 price point was driven by the first method: running a single-node Hadoop installation. I wanted to make sure the dataset used in this benchmark could easily fit into memory.

The price set the limit for the second method: AWS EMR. This is Amazon’s Hadoop Platform offering. It has a huge feature set but the key one is that it lets you setup Hadoop clusters with very little instruction. The $3.00 price limit includes the service fee for EMR.

Note I’ll be running everything in AWS’ Irish region and the prices mentioned are region- and in the case of spot prices, time-specific.

I was a bit surprised at which service won.

Related Posts

Jupyter Notebooks In Azure

Steve Jones looks at using Jupyter Notebooks in Azure: There’s a new feature in Azure, and I stumbled on it when someone posted a link on Twitter. Apologies, I can’t remember who, but I did click on the Azure Notebooks link and was intrigued. I’ve gotten Jupyter notebooks running on my local laptop, but these are often […]

Read More

Uploading Data Sets To Azure ML From R

Leila Etaati continues her series on the Azure ML R package by showing how to upload a data set: There is a function in AzureML package name “workspace” that creates a reference to an AzureML Studio workspace by getting the authentication token and workspace id as below: 1 ws <– workspace( id , auth  ) to […]

Read More

Categories

March 2018
MTWTFSS
« Feb Apr »
 1234
567891011
12131415161718
19202122232425
262728293031