Comparing Hadoop On EC2 To EMR

Mark Litwintschik looks at EC2 versus EMR in terms of performance, specifically targeting a solution at less than $3 per hour:

The $3.00 price point was driven by the first method: running a single-node Hadoop installation. I wanted to make sure the dataset used in this benchmark could easily fit into memory.

The price set the limit for the second method: AWS EMR. This is Amazon’s Hadoop Platform offering. It has a huge feature set but the key one is that it lets you setup Hadoop clusters with very little instruction. The $3.00 price limit includes the service fee for EMR.

Note I’ll be running everything in AWS’ Irish region and the prices mentioned are region- and in the case of spot prices, time-specific.

I was a bit surprised at which service won.

Related Posts

Azure SQL Database and Extended Events

Dave Bland shows how to set up and read an extended event file on Azure SQL Database: This first step when using T-SQL to read Extended Files that are stored in an Azure Storage Account is to create a database credential.  Of course the credential will provide essential security information to connect to the Azure […]

Read More

Overriding Spark Dependencies

Landon Robinson shows how to override a Spark dependency located on the classpath: This doesn’t draw the line exactly where the method changed from private to public, but generally speaking:– gson-2.2.4.jar: the method is private, and therefore too old for use here– gson-2.6.1: the method is public, and works fine.– Somewhere between the two, the […]

Read More

Categories

March 2018
MTWTFSS
« Feb Apr »
 1234
567891011
12131415161718
19202122232425
262728293031