Database Reference
In-Depth Information
Apache Hive
Like Cloudera Impala, Apache Hive offers an SQL-type language called Hive QL that can be used to manipulate Hive-
based tables. The functionality of Hive QL can be extended by creating user-defined functions (UDF), as you'll see in
an example shortly.
In this section, I use Hive version 0.10, which was installed in Chapter 7 along with the Hue application. As you
remember, Hue was installed on the server hc1nn and has the URL
http://hc1nn:8888/
. Though I don't mention
Hue again in this chapter, I use the Beeswax Hive user interface to enter the scripts at the Hue URL. Here, I walk you,
step by step, through table creation,
SELECT
statements, joins, and
WHERE
clauses. To make the examples a bit more
interesting, I have sourced some real UK trade CSV files to use as data.
■
For more information and in-depth documentation on hive and hue, see the apache hive website at
hive.apache.org
.
Note
Database Creation
To begin the example, I create a database to contain this Hive work, using the
CREATE DATABASE
command. I name the
database and specify to create it only if it does not already exist:
CREATE DATABASE IF NOT EXISTS trade;
I set the current database with the
USE
command; in this case, I set it to trade:
USE trade;
External Table Creation
External tables are a useful step in an ETL chain because they offer the opportunity to move raw data files from an
HDFS directory into a Hive staging table. From the staging table you can transform the data for its journey to its
final state. Before I demonstrate table creation, though, I need to move the data files that I downloaded from the UK
government data site (
data.gov.uk
) from the Linux file system on hc1nn to HDFS. (If you want to obtain the same
data set to run these examples, you can source it from
http://data.gov.uk/dataset/financial-transactions-
To start the move, I create the /data directory on HDFS as the Linux hadoop user:
[hadoop@hc1nn data]$ hdfs dfs -mkdir /data
I then move the data files under the Linux directory /home/hadoop/data/uk_trade to this HDFS directory via a
copyFromLocal
HDFS command:
[hadoop@hc1nn uk_trade]$ pwd
/home/hadoop/data/
[hadoop@hc1nn uk_trade]$ ls uk_trade
ukti-admin-spend-apr-2011.csv ukti-admin-spend-jun-2012.csv
ukti-admin-spend-apr-2012.csv ukti-admin-spend-mar-2011.csv
.......
[hadoop@hc1nn data]$ hdfs dfs -copyFromLocal uk_trade /data
Search WWH ::
Custom Search