Hive - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Tables

A Hive table is logically made up of the data being stored and the associated metadata de-

scribing the layout of the data in the table. The data typically resides in HDFS, although it

may reside in any Hadoop filesystem, including the local filesystem or S3. Hive stores the

metadata in a relational database and not in, say, HDFS (see The Metastore ) .

In this section, we look in more detail at how to create tables, the different physical storage

formats that Hive offers, and how to import data into tables.

MULTIPLE DATABASE/SCHEMA SUPPORT

Many relational databases have a facility for multiple namespaces, which allows users and applications to

be segregated into different databases or schemas. Hive supports the same facility and provides commands

such as CREATE DATABASE dbname , USE dbname , and DROP DATABASE dbname . You can

fully qualify a table by writing dbname . tablename . If no database is specified, tables belong to the

default database.

Managed Tables and External Tables

When you create a table in Hive, by default Hive will manage the data, which means that

Hive moves the data into its warehouse directory. Alternatively, you may create an external

table , which tells Hive to refer to the data that is at an existing location outside the ware-

house directory.

The difference between the two table types is seen in the LOAD and DROP semantics. Let's

consider a managed table first.

When you load data into a managed table, it is moved into Hive's warehouse directory. For

example, this:

CREATE TABLE managed_table (dummy STRING);

LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table;

will move the file hdfs://user/tom/data.txt into Hive's warehouse directory for the man-

aged_table table, which is hdfs://user/hive/warehouse/managed_table. [ 110 ]

Search WWH ::

Custom Search

Home