Databases Reference
In-Depth Information
4.1. Importing Data from Two Tables
Problem
You need to import one main table; however, this table is normalized. The important
values are stored in the referenced dictionary tables, and the main table contains only
numeric foreign keys pointing to the values in the dictionaries rather than to natural
keys as in the original cities table. You would prefer to resolve the values prior to
running Sqoop and import the real values rather than the numerical keys for the
countries.
Solution
Instead of using table import, use free-form query import. In this mode, Sqoop will
allow you to specify any query for importing data. Instead of the parameter --table ,
use the parameter --query with the entire query for obtaining the data you would like
to transfer.
Let's look at an example with the normalized table normcities and its dictionary coun
tries . In order to achieve the same output as with importing the denormalized table
cities , you could use the following Sqoop command:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--query 'SELECT normcities.id, \
countries.country, \
normcities.city \
FROM normcities \
JOIN countries USING(country_id) \
WHERE $CONDITIONS' \
--split-by id \
--target-dir cities
Discussion
The free-form query import is one of the advanced features of Sqoop. As with all ad‐
vanced software features, it gives you great power. With great power comes significant
responsibility.
There is a lot to be aware of when using free-form query imports. By using query im‐
ports, Sqoop can't use the database catalog to fetch the metadata. This is one of the
reasons why using table import might be faster than the equivalent free-form query
import. Also, you have to manually specify some additional parameters that would
otherwise be populated automatically. In addition to the --query parameter, you need
Search WWH ::




Custom Search