Databases Reference
In-Depth Information
which contains the name of the partition column, and
--hive-partition-value
, which
specifies the desired value. For example, if your partition column is called
day
and you
want to import your data into the value
2013-05-22
, you would use the following
command:
sqoop import
\
--connect jdbc:mysql://mysql.example.com/sqoop
\
--username sqoop
\
--password sqoop
\
--table cities
\
--hive-import
\
--hive-partition-key day
\
--hive-partition-value
"2013-05-22"
Discussion
Sqoop mandates that the partition column be of type
STRING
. The current implemenā
tation is limited to a single partition level. Unfortunately, you can't use this feature if
your table has more than one level of partitioning (e.g., if you would like a partition by
day followed by a partition by hour). This limitation will most likely be removed in
future Sqoop releases.
Hive's partition support is implemented with virtual columns that are not part of the
data itself. Each partition operation must contain the name and value of the partition.
Sqoop can't use your data to determine which partition this should go into. Instead
Sqoop relies on the user to specify the parameter
--hive-partition-value
with an
appropriate value.
Sqoop won't accept a column name for this parameter.
6.7. Replacing Special Delimiters During Hive Import
Problem
You've imported the data directly into Hive using Sqoop's
--hive-import
feature. When
you call
SELECT count(*) FROM your_table
query to see how many rows are in the
imported table, you get a larger number than is stored in the source table on the relational
database side.