Moving Data - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

The wget command downloads the tarred and compressed connector library file from the web address

http://dev.mysql.com/get/Downloads/Connector-J/ . As soon as the file is downloaded, you unzip and untar it, and

then move it to the correct location so that it can be used by Sqoop:

[root@hc1nn ~]# ls -l mysql-connector-java-5.1.22.tar.gz

-rw-r--r--. 1 root root 4028047 Sep 6 2012 mysql-connector-java-5.1.22.tar.gz

This command shows the downloaded connector library, while the next commands show the file being unzipped

using the gunzip command and unpacked using the tar command with the expand ( x ) and file ( f ) options:

[root@hc1nn ~]# gunzip mysql-connector-java-5.1.22.tar.gz

[root@hc1nn ~]# tar xf mysql-connector-java-5.1.22.tar

[root@hc1nn ~]# ls -lrt

total 9604

drwxr-xr-x. 4 root root 4096 Sep 6 2012 mysql-connector-java-5.1.22

-rw-r--r--. 1 root root 9809920 Sep 6 2012 mysql-connector-java-5.1.22.tar

Now, you copy the connector library to the /usr/lib/sqoop/lib directory so that it is available to Sqoop when it

attempts to connect to a MySQL database:

[root@hc1nn ~]# cp mysql-connector-java-5.1.22/mysql-connector-java-5.1.22-bin.jar /usr/lib/sqoop/lib/

For this example installation, I use the Linux hadoop account. In that user's $HOME/.bashrc Bash shell

configuration file, I have defined some Hadoop and Map Reduce variables, as follows:

#######################################################

# Set up Sqoop variables

# For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running

# Pig, Hive, or Sqoop in a YARN installation, set the HADOOP_MAPRED_HOME

export HADOOP_CONF_DIR=/etc/hadoop/conf

export HADOOP_COMMON_HOME=/usr/lib/hadoop

export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

export YARN_HOME=/usr/lib/hadoop-yarn/

Use Sqoop to Import Data to HDFS

To import data from a database, you use the Sqoop import statement. For my MySQL database example, I use an

options file containing the connection and access information. Because these details are held in a single file, this

method requires less typing each time the task is repeated. The file that will be used to write table data to HDFS

contains nine lines.

The import line tells Sqoop that data will be imported from the database to HDFS. The -- connect option with a

connect string of jdbc:mysql://hc1nn/sqoop tells Sqoop that JDBC will be used to connect to a MySQL database on

server hc1nn called “sqoop.” I use the Linux cat command to show the contents of the Sqoop options file.

Search WWH ::

Custom Search

Home