An Apache Sqoop Tutorial

Kevin Feasel

2017-11-22

ETL, Hadoop

Subham Sinha has an introductory-level tutorial on Apache Sqoop:

For Hadoop developer, the actual game starts after the data is being loaded in HDFS. They play around this data in order to gain various insights hidden in the data stored in HDFS.

So, for this analysis the data residing in the relational database management systems need to be transferred to HDFS. The task of writing MapReduce code for importing and exporting data from relational database to HDFS is uninteresting & tedious. This is where Apache Sqoop comes to rescue and removes their pain. It automates the process of importing & exporting the data.

Sqoop makes the life of developers easy by providing CLI for importing and exporting data. They just have to provide basic information like database authentication, source, destination, operations etc. It takes care of remaining part.

Sqoop internally converts the command into MapReduce tasks, which are then executed over HDFS. It uses YARN framework to import and export the data, which provides fault tolerance on top of parallelism.

In my experience, Sqoop does two things really well:  first, it lets you move data from a relational database into HDFS (or Hive).  Second, it lets you move data from HDFS (or Hive) into a staging table on a relational database.  That can make Sqoop a useful part of an ETL process.

Related Posts

Mounting HDFS As A Local Filesystem

Guy Shilo looks at two techniques for mounting HDFS as a local filesystem: NFS Gateway is a HDFS component that enables the use to expose HDFS through NFS3 interface so that Linux machines can mount it and access it just as a local filesystem. The manual installation is quite cumbersome and is covered here. Cloudera manager […]

Read More

How Humio Uses Kafka

Kresten Krab describes ways that Humio uses Apache Kafka for their product: Humio is a log analytics system built to run both on-prem and as a hosted offering. It is designed for “on-prem first” because, in many logging use cases, you need the privacy and security of managing your own logging solution. And because volume […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930