Databases Reference
In-Depth Information
Solution
You can take advantage of the update feature that will issue UPDATE instead of INSERT
statements. The update mode is activated by using the parameter --update-key that
contains the name of a column that can identify a changed row—usually the primary
key of a table. For example, the following command allows you to use the column id of
table cities :
sqoop export \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--update-key id
Discussion
The parameter --update-key is used to instruct Sqoop to update existing rows rather
than insert new ones. This parameter requires a comma-separated list of columns that
should be used to uniquely identify a row. All of those columns will be used in the WHERE
clause of the generated UPDATE query. All other table columns will be used in the SET
part of the query. For example, for a table containing four columns ( c1 , c2 , c3 , and c4 ),
calling Sqoop with the --update-key c2,c4 will generate the following update query:
UPDATE table SET c1 = ? , c3 = ? WHERE c2 = ? and c3 = ?
It's very important to understand the structure of the query to see how the update mode
will export data from Hadoop. First of all, the columns used to identify the row will
never be updated because they are not part of the SET clause. Also, if your data in Hadoop
contains some completely new rows, the WHERE clause will not match any rows on the
database side. Such an operation on the database side is fully valid, but it results in no
updated rows. Therefore, new rows are not exported in update mode at all.
5.5. Updating or Inserting at the Same Time
Problem
You have data in your database from a previous export, but now you need to propagate
updates from Hadoop. Unfortunately, you can't use the update mode, as you have a
considerable number of new rows and you need to export them as well.
Solution
If you need both updates and inserts in the same job, you can activate the so-called
upsert mode with the --update-mode allowinsert parameter. For example:
Search WWH ::




Custom Search