Database Reference
In-Depth Information
Imports: A Deeper Look
As mentioned earlier, Sqoop imports a table from a database by running a MapReduce job
that extracts rows from the table, and writes the records to HDFS. How does MapReduce
read the rows? This section explains how Sqoop works under the hood.
At a high level, Figure 15-1 demonstrates how Sqoop interacts with both the database
source and Hadoop. Like Hadoop itself, Sqoop is written in Java. Java provides an API
called Java Database Connectivity, or JDBC, that allows applications to access data stored
in an RDBMS as well as to inspect the nature of this data. Most database vendors provide a
JDBC driver that implements the JDBC API and contains the necessary code to connect to
their database servers.
NOTE
Based on the URL in the connect string used to access the database, Sqoop attempts to predict which
driver it should load. You still need to download the JDBC driver itself and install it on your Sqoop client.
For cases where Sqoop does not know which JDBC driver is appropriate, users can specify the JDBC
driver explicitly with the --driver argument. This capability allows Sqoop to work with a wide variety
of database platforms.
Before the import can start, Sqoop uses JDBC to examine the table it is to import. It re-
trieves a list of all the columns and their SQL data types. These SQL types ( VARCHAR ,
INTEGER , etc.) can then be mapped to Java data types ( String , Integer , etc.), which
will hold the field values in MapReduce applications. Sqoop's code generator will use this
information to create a table-specific class to hold a record extracted from the table.
Search WWH ::




Custom Search