An Apache Sqoop Tutorial

Kevin Feasel

2017-11-22

ETL, Hadoop

Subham Sinha has an introductory-level tutorial on Apache Sqoop:

For Hadoop developer, the actual game starts after the data is being loaded in HDFS. They play around this data in order to gain various insights hidden in the data stored in HDFS.

So, for this analysis the data residing in the relational database management systems need to be transferred to HDFS. The task of writing MapReduce code for importing and exporting data from relational database to HDFS is uninteresting & tedious. This is where Apache Sqoop comes to rescue and removes their pain. It automates the process of importing & exporting the data.

Sqoop makes the life of developers easy by providing CLI for importing and exporting data. They just have to provide basic information like database authentication, source, destination, operations etc. It takes care of remaining part.

Sqoop internally converts the command into MapReduce tasks, which are then executed over HDFS. It uses YARN framework to import and export the data, which provides fault tolerance on top of parallelism.

In my experience, Sqoop does two things really well:  first, it lets you move data from a relational database into HDFS (or Hive).  Second, it lets you move data from HDFS (or Hive) into a staging table on a relational database.  That can make Sqoop a useful part of an ETL process.

Related Posts

Replicating Data In HDFS Between Clusters

Murali Ramasami and Niru Anisetti have an article showing how to use the Hortonworks Data Lifecycle Manager to set up replication between two Hadoop clusters: Data Lifecycle Manager (DLM) delivers on the promise of location-agnostic, secure replication by encapsulating and copying data seamlessly across physical private storage and public cloud environments. This empowers businesses to […]

Read More

Installing Confluent Platform On Windows

Niels Berglund shows how to install Confluent Platform (the Confluent branded version of Apache Kafka) on a Windows machine using the Windows Subsystem for Linux: WSL is primarily aimed at developers, and it allows you to run Linux environments directly on Windows in a native format and without the overhead of a virtual machine. Let us […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930