Databases Reference
In-Depth Information
ALTER
USER
sqoop
SET
search_path
=
public
,
us
;
7.3. Exporting into PostgreSQL Using pg_bulkload
Problem
You are using the
pg_bulkload
utility to load data to your PostgreSQL server. Since
Sqoop utilizes
mysqlimport
for MySQL, can Sqoop also utilize
pg_bulkload
for
PostgreSQL?
Solution
Sqoop offers a specialized connector for PostgreSQL that takes advantage of the
pg_bulkload
utility. You can use the following Sqoop command to make use of this
connector:
sqoop import
\
--connect jdbc:postgresql://postgresql.example.com/database
\
--username sqoop
\
--password sqoop
\
--connection-manager org.apache.sqoop.manager.PGBulkloadManager
\
--table cities
Discussion
pg_bulkload
is a third-party utility not distributed with PostgreSQL. You need to man‐
ually download and install it. It allows a user to load data into a PostgreSQL server at a
high speed by bypassing the write-ahead log and shared buffers. Using the
pg_bulk
load
utility with Sqoop is very simple, as Sqoop has built-in support for it.
As with other direct connectors, you need to have the
pg_bulkload
utility available on
all nodes in your Hadoop cluster because Sqoop's tasks can be executed on any Task‐
Tracker node. You can specify the path to the utility with the
pgbulkload.bin
property.
For example, if you installed the utility in
/usr/local/bin/pg_bulkload
, you can use
the following Sqoop command:
sqoop import
\
-Dpgbulkload.bin
=
/usr/local/bin/pg_bulkload
\
--connect jdbc:postgresql://postgresql.example.com/database
\
--username sqoop
\
--password sqoop
\
--connection-manager org.apache.sqoop.manager.PGBulkloadManager
\
--table cities
See Also
More information about
mysqlimport
for MySQL is in
Recipe 2.7
.