Database Reference
In-Depth Information
Tables
A Hive table is logically made up of the data being stored and the associated metadata de-
scribing the layout of the data in the table. The data typically resides in HDFS, although it
may reside in any Hadoop filesystem, including the local filesystem or S3. Hive stores the
metadata in a relational database and not in, say, HDFS (see The Metastore ) .
In this section, we look in more detail at how to create tables, the different physical storage
formats that Hive offers, and how to import data into tables.
MULTIPLE DATABASE/SCHEMA SUPPORT
Many relational databases have a facility for multiple namespaces, which allows users and applications to
be segregated into different databases or schemas. Hive supports the same facility and provides commands
such as CREATE DATABASE dbname , USE dbname , and DROP DATABASE dbname . You can
fully qualify a table by writing dbname . tablename . If no database is specified, tables belong to the
default database.
Managed Tables and External Tables
When you create a table in Hive, by default Hive will manage the data, which means that
Hive moves the data into its warehouse directory. Alternatively, you may create an external
table , which tells Hive to refer to the data that is at an existing location outside the ware-
house directory.
The difference between the two table types is seen in the LOAD and DROP semantics. Let's
consider a managed table first.
When you load data into a managed table, it is moved into Hive's warehouse directory. For
example, this:
CREATE TABLE managed_table (dummy STRING);
LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table;
will move the file hdfs://user/tom/data.txt into Hive's warehouse directory for the man-
aged_table table, which is hdfs://user/hive/warehouse/managed_table. [ 110 ]
Search WWH ::




Custom Search