Databases Reference
In-Depth Information
--password sqoop \
--table visits \
--incremental lastmodified \
--check-column last_update_date \
--last-value "2013-05-22 01:01:01"
Discussion
The incremental mode lastmodified requires a column holding a date value (suitable
types are date , time , datetime , and timestamp ) containing information as to when each
row was last updated. Sqoop will import only those rows that were updated after the
last import. This column should be populated to the current time on every new row
insertion or on a change to an existing row. This ensures that Sqoop can pick up changed
rows accurately. Sqoop knows only what you tell it. The onus is on your application to
reliably update this column on every row change. Any row that does not have a modified
column, as specified in the --check-column parameter, won't be imported.
Internally, the lastmodified incremental import consists of two standalone
MapReduce jobs. The first job will import the delta of changed data similarly to normal
import. This import job will save data in a temporary directory on HDFS. The second
job will take both the old and new data and will merge them together into the final
output, preserving only the last updated value for each row.
As in the case of the append type, all you need to do for subsequent incremental imports
is update the value of the --last-value parameter. For convenience, it is printed out
by Sqoop on every incremental import execution.
13/03/18 08:16:36 INFO tool.ImportTool: Incremental import complete! ...
13/03/18 08:16:36 INFO tool.ImportTool: --incremental lastmodified
13/03/18 08:16:36 INFO tool.ImportTool: --check-column update_date
13/03/18 08:16:36 INFO tool.ImportTool: --last-value '1987-05-22 02:02:02'
3.3. Preserving the Last Imported Value
Problem
Incremental import is a great feature that you're using a lot. Shouldering the responsi‐
bility for remembering the last imported value is getting to be a hassle.
Solution
You can take advantage of the built-in Sqoop metastore that allows you to save all pa‐
rameters for later reuse. You can create a simple incremental import job with the fol‐
lowing command:
sqoop job \
--create visits \
Search WWH ::




Custom Search