Hive - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Installing Hive

In normal use, Hive runs on your workstation and converts your SQL query into a series of

jobs for execution on a Hadoop cluster. Hive organizes data into tables, which provide a

means for attaching structure to data stored in HDFS. Metadata — such as table schemas

— is stored in a database called the metastore .

When starting out with Hive, it is convenient to run the metastore on your local machine. In

this configuration, which is the default, the Hive table definitions that you create will be

local to your machine, so you can't share them with other users. We'll see how to configure

a shared remote metastore, which is the norm in production environments, in The

Metastore .

Installation of Hive is straightforward. As a prerequisite, you need to have the same version

of Hadoop installed locally that your cluster is running. [ 107 ] Of course, you may choose to

run Hadoop locally, either in standalone or pseudodistributed mode, while getting started

with Hive. These options are all covered in Appendix A .

WHICH VERSIONS OF HADOOP DOES HIVE WORK WITH?

Any given release of Hive is designed to work with multiple versions of Hadoop. Generally, Hive works

with the latest stable release of Hadoop, as well as supporting a number of older versions, listed in the re-

lease notes. You don't need to do anything special to tell Hive which version of Hadoop you are using,

beyond making sure that the hadoop executable is on the path or setting the HADOOP_HOME environment

variable.

Download a release , and unpack the tarball in a suitable place on your workstation:

% tar xzf apache-hive- x.y.z -bin.tar.gz

It's handy to put Hive on your path to make it easy to launch:

% export HIVE_HOME=~/sw/apache-hive- x.y.z -bin

% export PATH=$PATH:$HIVE_HOME/bin

Now type hive to launch the Hive shell:

% hive

hive>

Search WWH ::

Custom Search

Home