Hadoop Ecosystem Integration - Apache Sqoop

Databases Reference

In-Depth Information

a type compatible with Hive. Usually this conversion is straightforward: for example,

JDBC types VARCHAR , CHAR , and other string-based types are all mapped to Hive STRING .

Sometimes the default mapping doesn't work correctly for your needs; in those cases,

you can use the parameter --map-column-hive to override it. This parameter expects

a comma-separated list of key-value pairs separated by the equal sign ( = ) in order to

specify which column should be matched to which type in Hive. For example, if you

want to change the Hive type of column id to STRING and column price to DECIMAL ,

you can specify the following Sqoop parameters:

sqoop import \

...

--hive-import \

--map-column-hive id = STRING,price = DECIMAL

During a Hive import, Sqoop will first do a normal HDFS import to a temporary loca‐

tion. After a successful import, Sqoop generates two queries: one for creating a table

and another one for loading the data from a temporary location. You can specify any

temporary location using either the --target-dir or --warehouse-dir parameter. It's

important not to use Hive's warehouse directory (usually /user/hive/warehouse ) for

the temporary location, as it may cause issues with loading data in the second step.

If your table already exists and contains data, Sqoop will append to the newly imported

data. You can change this behavior by using the parameter --hive-overwrite , which

will instruct Sqoop to truncate an existing Hive table and load only the newly imported

one. This parameter is very helpful when you need to refresh Hive's table data on a

periodic basis.

See Also

When you're overriding Hive type, you might also need to override the Java mapping

described in Recipe 2.8 .

6.6. Using Partitioned Hive Tables

Problem

You want to import data into Hive on a regular basis (for example, daily), and for that

purpose your Hive table is partitioned. You would like Sqoop to automatically import

data into the partition rather than only to the table.

Solution

Sqoop supports Hive partitioning out of the box. In order to take advantage of this

functionality, you need to specify two additional parameters: --hive-partition-key ,

Search WWH ::

Custom Search

Home