Database Reference
In-Depth Information
Sqoop
License
Apache License, Version 2.0
Activity
High
Purpose
Transfer data from HDFS to and from relational databases
Official Page
http://sqoop.apache.org
Hadoop Integration Fully Integrated
It's likely that some of your data may originate in a relational database management system
(RDBMS) that is usually accessed normally by SQL. You could also use your SQL engine to
produce flat files to load into HDFS. While dumps may load large datasets more quickly, you
may have reason to take data directly from an RDMBS or place the results of your Hadoop
processing into an RDBMS. Sqoop (meaning SQL to Hadoop) is designed to transfer data
between Hadoop clusters and relational databases. It's a top-level Apache project developed
by Cloudera, now in the public domain. While Sqoop automates much of the process, some
SQL knowledge is required to have this work properly. The Sqoop job is then transformed
into a MapReduce job that does the work.
You'll start your import to Hadoop with a database table that is read into Hadoop as a text
file or in Avro or SequenceFile format. You can also export an HDFS file into an RDBMS. In
this case, the MapReduce job reads a set of text-delimited files in HDFS in parallel and con-
verts them into rows in an RDBMS. There are options to filter rows and columns, alter de-
limiters, and more.
Tutorial Links
There's an excellent series of lectures on this topic available on YouTube. Once you've
watched Apache Sqoop Tutorial Part 1 , you can jump to Parts 2 , 3 , and 4 .
Search WWH ::




Custom Search