Using Sqoop To Move Data To Hadoop

Kevin Feasel

2017-07-18

Hadoop

The folks at Redglue have a few hints on using Sqoop to move data from a relational database to Hadoop:

  • “Data gets updated” problem

Data gets updated many times and loading data with Sqoop is not a single event as data that you are importing can be updated (INSERTed, DELETed or UPDATed). What is important here, is that, HDFS is an “append-only filesystem” (exceptions made to HBase and Hive with ACID, but they are mostly tricks) and the options are pretty simple: replace the dataset, add data to dataset (partition for example) or merge datasets between old and new data.

If the data that you are loading is a small dataset, don’t think twice, replace and overwrite it.

If the data that you are loading is a big data set, a “incremental” load is recommended. This can be a little tricky as Sqoop needs to know what modification were done since the last incremental or full import.

I’m not a huge fan of Sqoop and prefer to use my own ingest mechanisms, but it’s an easy way to get started.

Related Posts

Five Books For Learning Kafka

Data Flair has a guide to five books to help you learn Apache Kafka: The book “Kafka: The Definitive Guide” is written by engineers from Confluent andLinkedIn who are responsible for developing Kafka. They explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. It contains detailed examples as well. […]

Read More

Push-Based Alerting With Kafka Streams

Robin Moffatt shows how to take syslog data and create a notification app using Python and Kafka Streams: Now we can query from it and show the aggregate window timestamp alongside the result: ksql> SELECT ROWTIME, TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss'), \ HOST, INVALID_LOGIN_COUNT \ FROM INVALID_USERS_LOGINS_PER_HOST; 1521644100000 | 2018-03-21 14:55:00 | rpi-03 | 1 1521646620000 | […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31