Hive and the Hadoop herd - Hadoop in Action - page 248

Databases Reference

In-Depth Information

Hive

JDBC/

ODBC

Web

GUI

CLI

DDL

Queries

Parser

Planner

Optimizer

Metastore

Hadoop

Figure 11.1 Hive architecture.

Queries are parsed and executed

on Hadoop. The metastore is an

important component that helps

determine how queries will be run.

11.1.1 Installing and configuring Hive

Hive requires Java 1.6 and Hadoop version 0.17 or above. You can find the latest

release of Hive at http://hadoop.apache.org/hive/releases.html . Download and

extract the tarball

into a directory that we call HIVE_HOME . Hadoop needs to be up

and running already. In addition, you need to set up a couple directories in HDFS

for Hive to use.

bin/hadoop fs -mkdir /tmp

bin/hadoop fs -mkdir /user/hive/warehouse

bin/hadoop fs -chmod g+w /tmp

bin/hadoop fs -chmod g+w /user/hive/warehouse

If you let Hive manage your data completely for you, Hive will store your data under

the /user/hive/warehouse directory. Hive can automatically add compression

and

special directory structures (such as partitions) to those data to improve query perfor-

mance. It's good to let Hive manage your data if you plan on using Hive to query it.

But if you already have your data in some other directories in HDFS and want to keep

them there, Hive can work with them too. In that case, Hive will take your data as is and

won't try to optimize your data storage for query processing. Some casual users don't

understand this distinction, and believe that Hive requires data to be in some special

Hive format. This is definitely not true.

Next Page

Hadoop in Action

Search WWH ::

Custom Search

Home