Hive - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Execution engines

Hive was originally written to use MapReduce as its execution engine, and that is still the

default. It is now also possible to run Hive using Apache Tez as its execution engine, and

work is underway to support Spark (see Chapter 19 ), too. Both Tez and Spark are general

directed acyclic graph (DAG) engines that offer more flexibility and higher performance

than MapReduce. For example, unlike MapReduce, where intermediate job output is ma-

terialized to HDFS, Tez and Spark can avoid replication overhead by writing the interme-

diate output to local disk, or even store it in memory (at the request of the Hive planner).

The execution engine is controlled by the hive.execution.engine property, which

defaults to mr (for MapReduce). It's easy to switch the execution engine on a per-query

basis, so you can see the effect of a different engine on a particular query. Set Hive to use

Tez as follows:

hive> SET hive.execution.engine=tez;

Note that Tez needs to be installed on the Hadoop cluster first; see the Hive documenta-

tion for up-to-date details on how to do this.

Logging

You can find Hive's error log on the local filesystem at ${java.io.tmpdir}/${user.name}/

hive.log . It can be very useful when trying to diagnose configuration problems or other

types of error. Hadoop's MapReduce task logs are also a useful resource for troubleshoot-

ing; see Hadoop Logs for where to find them.

On many systems, ${java.io.tmpdir} is /tmp , but if it's not, or if you want to set

the logging directory to be another location, then use the following:

% hive -hiveconf hive.log.dir='/tmp/${user.name}'

The logging configuration is in conf/hive-log4j.properties , and you can edit this file to

change log levels and other logging-related settings. However, often it's more convenient

to set logging configuration for the session. For example, the following handy invocation

will send debug messages to the console:

% hive -hiveconf hive.root.logger=DEBUG,console

Hive Services

The Hive shell is only one of several services that you can run using the hive command.

You can specify the service to run using the --service option. Type hive --ser-

Search WWH ::

Custom Search

Home