Database Reference
In-Depth Information
argument indicates the relational table that will be populated from Hadoop.
Alternatively, you can use the --call argument to indicate that a stored
procedure should be called for each row of information found in the Hadoop
system.
If you do not specify the --call argument, by default Sqoop generates
an INSERT statement for each record found in the Hadoop directory. By
specifying the --update-key argument and indicating a key column or
columns, you can modify this behavior to generate UPDATE statements
rather than INSERT s. You can use the --update-mode argument to
indicate rows that don't already exist in the target table should be inserted,
and rows that do exist should be updated:
sqoop export --connect
"jdbc:sqlserver://Your_SqlServer;database=MsBigData;
Username=demoPassword=your)password;" --table
Customers
--export-dir /MsBigData/Customers
--update-key ID --update-mode allowinsert
Exports done using Sqoop commit to the target database every 10,000 rows.
This prevents excessive resources from being tied up on the database server
managing largetransactions. However,itdoesmeanthattheexportsarenot
atomic and that a failure during execution may leave a partial set of rows in
the target database.
The --m argument controls the amount of parallel activity, just as it does
withtheimport.Thesamewarnings andcaveatsapplytoitsusewithexport.
Particularly in the case of exports, because Sqoop does its operations on a
row-by-row basis, running a large number of parallel nodes can have a very
negative impact on the target database.
Sqoop is a useful tool for quickly moving data in and out of Hadoop,
particularly if it is a one-time operation or the performance is not
particularly important.
Search WWH ::




Custom Search