Databases Reference
In-Depth Information
2.3. Importing Only a Subset of Data
Problem
Instead of importing an entire table, you need to transfer only a subset of the rows based
on various conditions that you can express in the form of a SQL statement with a WHERE
clause.
Solution
Use the command-line parameter --where to specify a SQL condition that the imported
data should meet. For example, to import only USA cities from the table cities , you
can issue the following Sqoop command:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--where "country = 'USA'"
Discussion
Sqoop will propagate the content of the --where parameter as is to all generated queries
that fetch data. This provides a powerful ability by which to express any condition that
your particular database server can process. Any special functions, conversions, or even
user-defined functions can be used. Because the SQL fragment will be propagated into
generated queries without any Sqoop processing, any invalid fragments may result in
nonintuitive exceptions that are hard to debug. This parameter can be confusing for
new Sqoop users.
When using the --where parameter, keep in mind the parallel nature of Sqoop transfers.
Data will be transferred in several concurrent tasks. Any expensive function call will
put a significant performance burden on your database server. Advanced functions
could lock certain tables, preventing Sqoop from transferring data in parallel. This will
adversely affect transfer performance. For efficient advanced filtering, run the filtering
query on your database prior to import, save its output to a temporary table and run
Sqoop to import the temporary table into Hadoop without the --where parameter.
2.4. Protecting Your Password
Problem
Typing your password into the command-line interface is insecure. It can be easily
retrieved from listing the operating system's running processes.
Search WWH ::




Custom Search