S3 Or EBS?

Devadutta Ghat, et al, compare Amazon S3 versus Elastic Block Storage (EBS) on the basis of cost and Apache Impala performance:

EBS is attached to the AWS compute node as a fully-functional filesystem (similar to an attached SSD on an on-premise node), and Impala makes use of several filesystem features to deliver higher throughput and lower latency. These features include:

  • HDFS short-circuit reads to bypass HDFS and read files directly from the filesystem
  • OS buffer cache to read frequently accessed files directly from the cache instead of fetching it again
  • Fixed-cost file renames through metadata operations

In contrast, S3 is an object store that is accessed over the network. However, with S3, throughput is better than simple network-attached storage because of its dedicated, high-performance networks. In Cloudera’s internal benchmark testing (detailed below), on an r3.2xlarge, we saw a consistent throughput of about 100MB/s. Furthermore, in S3, there is currently no equivalent to HDFS short-circuit reads. Move/rename operations for data stored in S3 is a copy followed by a delete, while a file move on HDFS is a metadata operation—which is usually problematic for ETL workloads, as they create large number of small files that are typically moved.

It looks like EBS is a solid choice for many workloads.

Related Posts

Flink’s State Processor API

Seth Wiesman and Fabian Hueske show off Apache Flink’s State Processor API: The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to […]

Read More

Derivative Event Sourcing

Anna McDonald explains the concept of derivative event sourcing: If you happen to be the proud owner of a single order service, then you are all set to begin. But what if you have more than one order service? Something that tends to happen at companies that have been around for more than a sprint […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930