Hadoop Ecosystem Integration - Apache Sqoop

Databases Reference

In-Depth Information

which contains the name of the partition column, and --hive-partition-value , which

specifies the desired value. For example, if your partition column is called day and you

want to import your data into the value 2013-05-22 , you would use the following

command:

sqoop import \

--connect jdbc:mysql://mysql.example.com/sqoop \

--username sqoop \

--password sqoop \

--table cities \

--hive-import \

--hive-partition-key day \

--hive-partition-value "2013-05-22"

Discussion

Sqoop mandates that the partition column be of type STRING . The current implemen‐

tation is limited to a single partition level. Unfortunately, you can't use this feature if

your table has more than one level of partitioning (e.g., if you would like a partition by

day followed by a partition by hour). This limitation will most likely be removed in

future Sqoop releases.

Hive's partition support is implemented with virtual columns that are not part of the

data itself. Each partition operation must contain the name and value of the partition.

Sqoop can't use your data to determine which partition this should go into. Instead

Sqoop relies on the user to specify the parameter --hive-partition-value with an

appropriate value.

Sqoop won't accept a column name for this parameter.

6.7. Replacing Special Delimiters During Hive Import

Problem

You've imported the data directly into Hive using Sqoop's --hive-import feature. When

you call SELECT count(*) FROM your_table query to see how many rows are in the

imported table, you get a larger number than is stored in the source table on the relational

database side.

Search WWH ::

Custom Search

Home