Database Reference
In-Depth Information
The first argument of the command,
--connect
, determines what type of
driver you will use for connecting to the relational database. In this case, the
command is specifying that Sqoop will use the SQL Server JDBC driver to
connect to the database.
NOTE
When specifying the connection to the database, you should use the
server name or IP address. Do not use localhost, because this
connection string will be sent to all the cluster nodes involved in the
job, and they will attempt to make their own connections. Because
localhost refers to the local computer, each node will attempt to
connect to the database as if it exists on that node, which will likely fail.
You may notice that the --
connect
argument contains the full connection
string for the database. Ideally, you will use Windows Authentication in the
connection string so that the password doesn't have to be specified. You can
also use the
--password-file
argument to tell Sqoop to use a file that
stores the password, instead of entering it as part of the command.
The
--table
argument tells Sqoop which table you intend to import from
the specified database. This is the table that Sqoop will derive its metadata
from. By default, all columns within the table are imported. You can limit
the column list by using the
--columns
argument:
--columns "FirstName,LastName,City,State,PostalCode"
You can also filter the rows returned by Sqoop by using the
--where
argument, which enables you to specify a where clause for the query:
--where "State='FL'"
If you need to execute a more complex query, you can replace the
--table
,
--columns
, and
--where
arguments with a
--query
argument. This lets
you specify an arbitrary
SELECT
statement, but some constraints apply.
The
SELECT
statement must be relatively straightforward; nested tables
and common table expressions can cause problems. Because Sqoop needs