Database Reference
In-Depth Information
Running Hive
In this section, we look at some more practical aspects of running Hive, including how to
set up Hive to run against a Hadoop cluster and a shared metastore. In doing so, we'll see
Hive's architecture in some detail.
Configuring Hive
Hive is configured using an XML configuration file like Hadoop's. The file is called
hive-
site.xml
and is located in Hive's
conf
directory. This file is where you can set properties that
you want to set every time you run Hive. The same directory contains
hive-default.xml
,
which documents the properties that Hive exposes and their default values.
You can override the configuration directory that Hive looks for in
hive-site.xml
by passing
the
--config
option to the
hive
command:
%
hive --config /Users/tom/dev/hive-conf
Note that this option specifies the containing directory, not
hive-site.xml
itself. It can be
useful when you have multiple site files — for different clusters, say — that you switch
between on a regular basis. Alternatively, you can set the
HIVE_CONF_DIR
environment
variable to the configuration directory for the same effect.
The
hive-site.xml
file is a natural place to put the cluster connection details: you can specify
the filesystem and resource manager using the usual Hadoop properties,
fs.defaultFS
uring Hadoop). If not set, they default to the local filesystem and the local (in-process) job
runner — just like they do in Hadoop — which is very handy when trying out Hive on
small trial datasets. Metastore configuration settings (covered in
The Metastore
)
are com-
monly found in
hive-site.xml
, too.
Hive also permits you to set properties on a per-session basis, by passing the
-hiveconf
option to the
hive
command. For example, the following command sets the cluster (in this
case, to a pseudodistributed cluster) for the duration of the session:
%
hive -hiveconf fs.defaultFS=hdfs://localhost \
-hiveconf mapreduce.framework.name=yarn \
-hiveconf yarn.resourcemanager.address=localhost:8032