Databases Reference
In-Depth Information
Solution
You can take advantage of the update feature that will issue
UPDATE
instead of
INSERT
statements. The update mode is activated by using the parameter
--update-key
that
contains the name of a column that can identify a changed row—usually the primary
key of a table. For example, the following command allows you to use the column
id
of
table
cities
:
sqoop
export
\
--connect jdbc:mysql://mysql.example.com/sqoop
\
--username sqoop
\
--password sqoop
\
--table cities
\
--update-key id
Discussion
The parameter
--update-key
is used to instruct Sqoop to update existing rows rather
than insert new ones. This parameter requires a comma-separated list of columns that
should be used to uniquely identify a row. All of those columns will be used in the
WHERE
clause of the generated
UPDATE
query. All other table columns will be used in the
SET
part of the query. For example, for a table containing four columns (
c1
,
c2
,
c3
, and
c4
),
calling Sqoop with the
--update-key c2,c4
will generate the following update query:
UPDATE
table
SET
c1
=
?
,
c3
=
?
WHERE
c2
=
?
and
c3
=
?
It's very important to understand the structure of the query to see how the update mode
will export data from Hadoop. First of all, the columns used to identify the row will
never be updated because they are not part of the
SET
clause. Also, if your data in Hadoop
contains some completely new rows, the
WHERE
clause will not match any rows on the
database side. Such an operation on the database side is fully valid, but it results in no
updated rows. Therefore, new rows are not exported in update mode at all.
5.5. Updating or Inserting at the Same Time
Problem
You have data in your database from a previous export, but now you need to propagate
updates from Hadoop. Unfortunately, you can't use the update mode, as you have a
considerable number of new rows and you need to export them as well.
Solution
If you need both updates and inserts in the same job, you can activate the so-called
upsert
mode with the
--update-mode allowinsert
parameter. For example: