HBase Clients - HBase Essentials

Database Reference

In-Depth Information

The Hadoop MapReduce framework is used to process a large scale of

data. For these MapReduce jobs, Hbase can be used in variety of ways

such as data source or target or both. This section does not talk about

MapReduce usage as it is already covered in the previous chapter.

Hive

Hive is a data warehouse infrastructure built on top of Hadoop. Hive provides a

SQL-like query language called HiveQL that allows querying the semi-structured

data stored in Hadoop. This query is converted into a MapReduce job and is

executed as a MapReduce cluster. These jobs, like any other MR (MapReduce) job,

can read and process data other than the Hive table stored on HDFS. In Hive, tables

can be deined as backed by HBase tables where the row key can be exposed as

another column when needed.

Get started with the Hive installation, table creation, and data insertion

at https://cwiki.apache.org/confluence/display/Hive/

GettingStarted .

Create an HBase-backed table, as shown in the following command:

hive> CREATE TABLE employee(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")

TBLPROPERTIES ("hbase.table.name" = "hbase_tab1");

OK

Time taken: 0.152 seconds

The preceding DDL statement maps the HBase table, deined using TBLPROPERTIES ,

using the HBase handler. The hbase.columns.mapping maps the column with the

name ":key" to the HBase rowkey. The optional hbase.table.name map only

requires HBase, and Hive has different table names.

To load the table from a Hive table (employee) to an HBase table, use the

following code:

hive> INSERT OVERWRITE TABLE hbase_tab SELECT * FROM employee;

Search WWH ::

Custom Search

Home