Spark Clusters On Spot Pricing

Sameer Farooqui explains spot pricing with respect to AWS servers:

The idea behind Spot instances is to allow you to bid on spare Amazon EC2 compute capacity. You choose the max price you’re willing to pay per EC2 instance hour. If your bid meets or exceeds the Spot market price, you win the Spot instances. However, unlike traditional bidding, when your Spot instances start running, you pay the live Spot market price (not your bid amount). Spot prices fluctuate based on the supply and demand of available EC2 compute capacity and are specific to different regions and availability zones.

So, although you may have bid 0.55 cents per hour for a r3.2xlarge instance, you’ll end up paying only 0.10 cents an hour if that’s what the going rate is for the region and availability zone.

Databricks uses spot pricing for Community Edition clusters to control costs.  Click through for a very interesting discussion of spot pricing and how they take advantage of it.

Related Posts

Serverless Lambda Architecture

Laith Al-Saadoon shows off a new Amazon Web Services product, AWS Glue, which allows you to build a data processing system on the Lambda architecture without directly provisioning any EC2 instances: With the launch of AWS Glue, AWS provides a portfolio of services to architect a Big Data platform without managing any servers or clusters. […]

Read More

Instant Log Initialization In Azure

Dimitri Furman shows a benefit of creating database files with Azure Blob Storage: Recently, we were working on a performance testing exercise using a SQL Server database with files in Azure Blob Storage. After creating the database using the default 8 MB size for data and log file (as in the example above), we wanted […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930