Performance Optimization - HBase Design Patterns

Database Reference

In-Depth Information

We have one column family and we call it f (short for family).

Now, we will store two columns, max temperature and min temperature, in this

column family.

Pig for MapReduce

Pig allows you to write MapReduce programs at a very high level, and inserting data

into HBase is just as easy.

Here's a Pig script that reads the sensor data from HDFS and writes it in HBase:

-- ## hdfs-to-hbase.pig

data = LOAD 'hbase-import/' using PigStorage(',') as (sensor_

id:chararray, max:int, min:int);

-- describe data;

-- dump data;

Now, store the data in hbase://sensors using the following line of code:

org.apache.pig.backend.hadoop.hbase.HBaseStorage('f:max,f:min');

After creating the table, in the first command, we will load data from the hbase-

import directory in HDFS.

The schema for the data is defined as follows:

Sensor_id : chararray (string)

max : int

min : int

The describe and dump statements can be used to inspect the data; in Pig, describe

will give you the structure of the data object you have, and dump will output all the

data to the terminal.

The final STORE command is the one that inserts the data into HBase. Let's analyze

how it is structured:

• INTO 'hbase://sensors' : This tells Pig to connect to the sensors

HBase table.

• org.apache.pig.backend.hadoop.hbase.HBaseStorage : This is the

Pig class that will be used to write in HBase. Pig has adapters for multiple

data stores.

• The first field in the tuple, sensor_id , will be used as a row key.

Search WWH ::

Custom Search

Home