Database Reference
In-Depth Information
Table default.sales stats: [numFiles=1, numRows=0, totalSize=189,
rawDataSize=0]
OK
Time taken: 0.6 seconds
Sqoop can generate a Hive table based on a table from an existing relational data source.
We've already imported the widgets data to HDFS, so we can generate the Hive table
definition and then load in the HDFS-resident data:
% sqoop create-hive-table --connect jdbc:mysql://localhost/
hadoopguide \
> --table widgets --fields-terminated-by ','
...
14/10/29 11:54:52 INFO hive.HiveImport: OK
14/10/29 11:54:52 INFO hive.HiveImport: Time taken: 1.098 seconds
14/10/29 11:54:52 INFO hive.HiveImport: Hive import complete.
% hive
hive> LOAD DATA INPATH "widgets" INTO TABLE widgets;
Loading data to table widgets
OK
Time taken: 3.265 seconds
When creating a Hive table definition with a specific already imported dataset in mind, we
need to specify the delimiters used in that dataset. Otherwise, Sqoop will allow Hive to
use its default delimiters (which are different from Sqoop's default delimiters).
NOTE
Hive's type system is less rich than that of most SQL systems. Many SQL types do not have direct ana-
logues in Hive. When Sqoop generates a Hive table definition for an import, it uses the best Hive type
available to hold a column's values. This may result in a decrease in precision. When this occurs, Sqoop
will provide you with a warning message such as this one:
14/10/29 11:54:43 WARN hive.TableDefWriter:
Column design_date had to be
cast to a less precise type in Hive
This three-step process of importing data to HDFS, creating the Hive table, and then load-
ing the HDFS-resident data into Hive can be shortened to one step if you know that you
want to import straight from a database directly into Hive. During an import, Sqoop can
generate the Hive table definition and then load in the data. Had we not already performed
the import, we could have executed this command, which creates the widgets table in
Hive based on the copy in MySQL:
Search WWH ::




Custom Search