Hive - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Running Hive

In this section, we look at some more practical aspects of running Hive, including how to

set up Hive to run against a Hadoop cluster and a shared metastore. In doing so, we'll see

Hive's architecture in some detail.

Configuring Hive

Hive is configured using an XML configuration file like Hadoop's. The file is called hive-

site.xml and is located in Hive's conf directory. This file is where you can set properties that

you want to set every time you run Hive. The same directory contains hive-default.xml ,

which documents the properties that Hive exposes and their default values.

You can override the configuration directory that Hive looks for in hive-site.xml by passing

the --config option to the hive command:

% hive --config /Users/tom/dev/hive-conf

Note that this option specifies the containing directory, not hive-site.xml itself. It can be

useful when you have multiple site files — for different clusters, say — that you switch

between on a regular basis. Alternatively, you can set the HIVE_CONF_DIR environment

variable to the configuration directory for the same effect.

The hive-site.xml file is a natural place to put the cluster connection details: you can specify

the filesystem and resource manager using the usual Hadoop properties, fs.defaultFS

and yarn.resourcemanager.address (see Appendix A for more details on config-

uring Hadoop). If not set, they default to the local filesystem and the local (in-process) job

runner — just like they do in Hadoop — which is very handy when trying out Hive on

small trial datasets. Metastore configuration settings (covered in The Metastore ) are com-

monly found in hive-site.xml , too.

Hive also permits you to set properties on a per-session basis, by passing the -hiveconf

option to the hive command. For example, the following command sets the cluster (in this

case, to a pseudodistributed cluster) for the duration of the session:

% hive -hiveconf fs.defaultFS=hdfs://localhost \

-hiveconf mapreduce.framework.name=yarn \

-hiveconf yarn.resourcemanager.address=localhost:8032

Search WWH ::

Custom Search

Home