Databases Reference
In-Depth Information
Discussion
Rather than using the JDBC interface for transferring data, the direct mode delegates
the job of transferring data to the native utilities provided by the database vendor. In
the case of MySQL, the mysqldump and mysqlimport will be used for retrieving data
from the database server or moving data back. In the case of PostgreSQL, Sqoop will
take advantage of the pg_dump utility to import data. Using native utilities will greatly
improve performance, as they are optimized to provide the best possible transfer speed
while putting less burden on the database server. There are several limitations that come
with this faster import. For one, not all databases have available native utilities. This
mode is not available for every supported database. Out of the box, Sqoop has direct
support only for MySQL and PostgreSQL.
Because all data transfer operations are performed inside generated MapReduce jobs
and because the data transfer is being deferred to native utilities in direct mode, you
will need to make sure that those native utilities are available on all of your Hadoop
TaskTracker nodes. For example, in the case of MySQL, each node hosting a TaskTracker
service needs to have both mysqldump and mysqlimport utilities installed.
Another limitation of the direct mode is that not all parameters are supported. As the
native utilities usually produce text output, binary formats like SequenceFile or Avro
won't work. Also, parameters that customize the escape characters, type mapping, col‐
umn and row delimiters, or the NULL substitution string might not be supported in all
cases.
See Also
Sqoop also supports the pg_bulkload utility for PostgreSQL via a special build-in con‐
nector. You can find more information about that in Recipe 7.3 .
2.8. Overriding Type Mapping
Problem
The default type mapping that Sqoop provides between relational databases and Hadoop
usually works well. You have use cases requiring you to override the mapping.
Solution
Use Sqoop's ability to override default type mapping using the parameter --map-
column-java . For example, to override the type of column id to Java type Long :
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
Search WWH ::




Custom Search