Database Reference
In-Depth Information
Installing Hive
In normal use, Hive runs on your workstation and converts your SQL query into a series of
jobs for execution on a Hadoop cluster. Hive organizes data into tables, which provide a
means for attaching structure to data stored in HDFS. Metadata — such as table schemas
— is stored in a database called the metastore .
When starting out with Hive, it is convenient to run the metastore on your local machine. In
this configuration, which is the default, the Hive table definitions that you create will be
local to your machine, so you can't share them with other users. We'll see how to configure
a shared remote metastore, which is the norm in production environments, in The
Metastore .
Installation of Hive is straightforward. As a prerequisite, you need to have the same version
of Hadoop installed locally that your cluster is running. [ 107 ] Of course, you may choose to
run Hadoop locally, either in standalone or pseudodistributed mode, while getting started
with Hive. These options are all covered in Appendix A .
WHICH VERSIONS OF HADOOP DOES HIVE WORK WITH?
Any given release of Hive is designed to work with multiple versions of Hadoop. Generally, Hive works
with the latest stable release of Hadoop, as well as supporting a number of older versions, listed in the re-
lease notes. You don't need to do anything special to tell Hive which version of Hadoop you are using,
beyond making sure that the hadoop executable is on the path or setting the HADOOP_HOME environment
variable.
Download a release , and unpack the tarball in a suitable place on your workstation:
% tar xzf apache-hive- x.y.z -bin.tar.gz
It's handy to put Hive on your path to make it easy to launch:
% export HIVE_HOME=~/sw/apache-hive- x.y.z -bin
% export PATH=$PATH:$HIVE_HOME/bin
Now type hive to launch the Hive shell:
% hive
hive>
Search WWH ::




Custom Search