Database Reference
In-Depth Information
Apache Hive
Like Cloudera Impala, Apache Hive offers an SQL-type language called Hive QL that can be used to manipulate Hive-
based tables. The functionality of Hive QL can be extended by creating user-defined functions (UDF), as you'll see in
an example shortly.
In this section, I use Hive version 0.10, which was installed in Chapter 7 along with the Hue application. As you
remember, Hue was installed on the server hc1nn and has the URL http://hc1nn:8888/ . Though I don't mention
Hue again in this chapter, I use the Beeswax Hive user interface to enter the scripts at the Hue URL. Here, I walk you,
step by step, through table creation, SELECT statements, joins, and WHERE clauses. To make the examples a bit more
interesting, I have sourced some real UK trade CSV files to use as data.
For more information and in-depth documentation on hive and hue, see the apache hive website at
hive.apache.org .
Note
Database Creation
To begin the example, I create a database to contain this Hive work, using the CREATE DATABASE command. I name the
database and specify to create it only if it does not already exist:
CREATE DATABASE IF NOT EXISTS trade;
I set the current database with the USE command; in this case, I set it to trade:
USE trade;
External Table Creation
External tables are a useful step in an ETL chain because they offer the opportunity to move raw data files from an
HDFS directory into a Hive staging table. From the staging table you can transform the data for its journey to its
final state. Before I demonstrate table creation, though, I need to move the data files that I downloaded from the UK
government data site ( data.gov.uk ) from the Linux file system on hc1nn to HDFS. (If you want to obtain the same
data set to run these examples, you can source it from http://data.gov.uk/dataset/financial-transactions-
admin-spend-data-ukti . )
To start the move, I create the /data directory on HDFS as the Linux hadoop user:
[hadoop@hc1nn data]$ hdfs dfs -mkdir /data
I then move the data files under the Linux directory /home/hadoop/data/uk_trade to this HDFS directory via a
copyFromLocal HDFS command:
[hadoop@hc1nn uk_trade]$ pwd
/home/hadoop/data/
[hadoop@hc1nn uk_trade]$ ls uk_trade
ukti-admin-spend-apr-2011.csv ukti-admin-spend-jun-2012.csv
ukti-admin-spend-apr-2012.csv ukti-admin-spend-mar-2011.csv
.......
[hadoop@hc1nn data]$ hdfs dfs -copyFromLocal uk_trade /data
 
 
Search WWH ::




Custom Search