Database Reference
In-Depth Information
The Hadoop MapReduce framework is used to process a large scale of
data. For these MapReduce jobs, Hbase can be used in variety of ways
such as data source or target or both. This section does not talk about
MapReduce usage as it is already covered in the previous chapter.
Hive
Hive is a data warehouse infrastructure built on top of Hadoop. Hive provides a
SQL-like query language called HiveQL that allows querying the semi-structured
data stored in Hadoop. This query is converted into a MapReduce job and is
executed as a MapReduce cluster. These jobs, like any other MR (MapReduce) job,
can read and process data other than the Hive table stored on HDFS. In Hive, tables
can be deined as backed by HBase tables where the row key can be exposed as
another column when needed.
Get started with the Hive installation, table creation, and data insertion
at https://cwiki.apache.org/confluence/display/Hive/
GettingStarted .
Create an HBase-backed table, as shown in the following command:
hive> CREATE TABLE employee(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hbase_tab1");
OK
Time taken: 0.152 seconds
The preceding DDL statement maps the HBase table, deined using TBLPROPERTIES ,
using the HBase handler. The hbase.columns.mapping maps the column with the
name ":key" to the HBase rowkey. The optional hbase.table.name map only
requires HBase, and Hive has different table names.
To load the table from a Hive table (employee) to an HBase table, use the
following code:
hive> INSERT OVERWRITE TABLE hbase_tab SELECT * FROM employee;
 
Search WWH ::




Custom Search