Database Reference
In-Depth Information
In this case, the output indicates that the Sqoop import was completed successfully. You check the HDFS data
directory by using the HDFS file system ls command and see the results of the job:
[hadoop@hc1nn sqoop]$ hdfs dfs -ls /user/hadoop/rawdata
Found 2 items
-rw-r--r-- 2 hadoop hadoop 0 2014-07-20 11:36 /user/hadoop/rawdata/_SUCCESS
-rw-r--r-- 2 hadoop hadoop 1427076 2014-07-20 11:36 /user/hadoop/rawdata/part-m-00000
These results show a _SUCCESS file and a part data file. You can dump the contents of the part file by using the
HDFS file system cat command. You can then pipe the contents to the wc (word count) Linux command | wc -l by
using the -l switch to give a file line count:
[hadoop@hc1nn sqoop]$ hdfs dfs -cat /user/hadoop/rawdata/part-m-00000 | wc -l
20031
The output shows that there were 20,031 lines imported from MySQL to HDFS, which matches the data volume
from MySQL. You can double-check the MySQL volume easily:
mysql --host=hc1nn --user=sqoop --password=xxxxxxxxxxxx
mysql> select count(*) from sqoop.rawdata;
+-----------+
| count(*) |
+-----------+
| 20031 |
+-----------+
Good; logging into MySQL as the user sqoop and getting a row count from the database table sqoop.rawdata by
using count (*) gives you a row count of 20,031.
This is a good import and thus a good test of Sqoop. Although this simple example shows an import, you could
also export data to a database. For example, you can easily import data, modify or enrich it and export it to another
database.
Use Sqoop to Import Data to Hive
As you saw, Sqoop can move data to HDFS, but what if you need to move the data into the Hive data warehouse?
Although you could use a Pig Latin or Hive script, Sqoop can directly import to Hive as well.
As for HDFS, you need to remember that Hive must be working on each data node before you attempt the Sqoop
import. Testing before making the import is far better than getting strange errors later. On each node, you run a simple
Hive show tables command, as follows:
[hadoop@hc1nn ~]$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_ac529ba0-df48-4c65-9440-dbddf48f87b5_42666910.txt
hive>
> show tables;
OK
Time taken: 2.089 seconds
 
Search WWH ::




Custom Search