Clients For Working With HDFS

Kevin Feasel

2018-10-23

Hadoop

Mark Litwintschik reviews several clients for working with the Hadoop Distributed Filesystem:

The Hadoop Distributed File System (HDFS) allows you to both federate storage across many computers as well as distribute files in a redundant manor across a cluster. HDFS is a key component to many storage clusters that possess more than a petabyte of capacity.

Each computer acting as a storage node in a cluster can contain one or more storage devices. This can allow several mechanical storage drives to both store data more reliably than SSDs, keep the cost per gigabyte down as well as go some way to exhausting the SATA bus capacity of a given system.

Hadoop ships with a feature-rich and robust JVM-based HDFS client. For many that interact with HDFS directly it is the go-to tool for any given task. That said, there is a growing population of alternative HDFS clients. Some optimise for responsiveness while others make it easier to utilise HDFS in Python applications. In this post I’ll walk through a few of these offerings.

Read on for reviews of those offerings.

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031