Sqoop - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

time-based incremental imports (specified by --incremental lastmodified ),

which is appropriate when existing rows may be updated, and there is a column (the check

column) that records the last modified time of the update.

At the end of an incremental import, Sqoop will print out the value to be specified as --

last-value on the next import. This is useful when running incremental imports

manually, but for running periodic imports it is better to use Sqoop's saved job facility,

which automatically stores the last value and uses it on the next job run. Type sqoop

job --help for usage instructions for saved jobs.

Direct-Mode Imports

Sqoop's architecture allows it to choose from multiple available strategies for performing

an import. Most databases will use the DataDrivenDBInputFormat -based approach

described earlier. Some databases, however, offer specific tools designed to extract data

quickly. For example, MySQL's mysqldump application can read from a table with

greater throughput than a JDBC channel. The use of these external tools is referred to as

direct mode in Sqoop's documentation. Direct mode must be specifically enabled by the

user (via the --direct argument), as it is not as general purpose as the JDBC approach.

(For example, MySQL's direct mode cannot handle large objects, such as CLOB or BLOB

columns, and that's why Sqoop needs to use a JDBC-specific API to load these columns

into HDFS.)

For databases that provide such tools, Sqoop can use these to great effect. A direct-mode

import from MySQL is usually much more efficient (in terms of map tasks and time re-

quired) than a comparable JDBC-based import. Sqoop will still launch multiple map tasks

in parallel. These tasks will then spawn instances of the mysqldump program and read

its output. Sqoop can also perform direct-mode imports from PostgreSQL, Oracle, and

Netezza.

Even when direct mode is used to access the contents of a database, the metadata is still

queried through JDBC.

Search WWH ::

Custom Search

Home