Databases Reference
In-Depth Information
--table cities
\
--map-column-java
id
=
Long
Discussion
The parameter
--map-column-java
accepts a comma separated list where each item is
a key-value pair separated by an equal sign. The exact column name is used as the key,
and the target Java type is specified as the value. For example, if you need to change
mapping in three columns
c1
,
c2
, and
c3
to
Float
,
String
, and
String
, respectively,
then your Sqoop command line would contain the following fragment:
sqoop import --map-column-java
c1
=
Float,c2
=
String,c3
=
String ...
An example of where this parameter is handy is when your MySQL table has a primary
key column that is defined as
unsigned int
with values that are bigger than 2 147 483
647. In this particular scenario, MySQL reports that the column has type
integer
, even
though the real type is
unsigned integer
. The maximum value for an
unsigned inte
ger
column in MySQL is 4 294 967 295. Because the reported type is
integer
, Sqoop
will use Java's
Integer
object, which is not able to contain values larger than 2 147 483
647. In this case, you have to manually provide hints to do more appropriate type map‐
ping.
Use of this parameter is not limited to overcoming MySQL's unsigned types problem.
It is further applicable to many use cases where Sqoop's default type mapping is not a
good fit for your environment. Sqoop fetches all metadata from database structures
without touching the stored data, so any extra knowledge about the data itself must be
provided separately if you want to take advantage of it. For example, if you're using
BLOB
or
BINARY
columns for storing textual data to avoid any encoding issues, you can use
the
--column-map-java
parameter to override the default mapping and import your
data as
String
.
2.9. Controlling Parallelism
Problem
Sqoop by default uses four concurrent map tasks to transfer data to Hadoop. Transfer‐
ring bigger tables with more concurrent tasks should decrease the time required to
transfer all data. You want the flexibility to change the number of map tasks used on a
per-job basis.