Databases Reference
In-Depth Information
Hive
JDBC/
ODBC
Web
GUI
CLI
DDL
Queries
Parser
Planner
Optimizer
Metastore
Hadoop
Figure 11.1 Hive architecture.
Queries are parsed and executed
on Hadoop. The metastore is an
important component that helps
determine how queries will be run.
11.1.1 Installing and configuring Hive
Hive requires Java 1.6 and Hadoop version 0.17 or above. You can find the latest
release of Hive at http://hadoop.apache.org/hive/releases.html . Download and
extract the tarball
into a directory that we call HIVE_HOME . Hadoop needs to be up
and running already. In addition, you need to set up a couple directories in HDFS
for Hive to use.
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir /user/hive/warehouse
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse
If you let Hive manage your data completely for you, Hive will store your data under
the /user/hive/warehouse directory. Hive can automatically add compression
and
special directory structures (such as partitions) to those data to improve query perfor-
mance. It's good to let Hive manage your data if you plan on using Hive to query it.
But if you already have your data in some other directories in HDFS and want to keep
them there, Hive can work with them too. In that case, Hive will take your data as is and
won't try to optimize your data storage for query processing. Some casual users don't
understand this distinction, and believe that Hive requires data to be in some special
Hive format. This is definitely not true.
 
Search WWH ::




Custom Search