Database Reference
In-Depth Information
We have one column family and we call it f (short for family).
Now, we will store two columns, max temperature and min temperature, in this
column family.
Pig for MapReduce
Pig allows you to write MapReduce programs at a very high level, and inserting data
into HBase is just as easy.
Here's a Pig script that reads the sensor data from HDFS and writes it in HBase:
-- ## hdfs-to-hbase.pig
data = LOAD 'hbase-import/' using PigStorage(',') as (sensor_
id:chararray, max:int, min:int);
-- describe data;
-- dump data;
Now, store the data in hbase://sensors using the following line of code:
org.apache.pig.backend.hadoop.hbase.HBaseStorage('f:max,f:min');
After creating the table, in the first command, we will load data from the hbase-
import directory in HDFS.
The schema for the data is defined as follows:
Sensor_id : chararray (string)
max : int
min : int
The describe and dump statements can be used to inspect the data; in Pig, describe
will give you the structure of the data object you have, and dump will output all the
data to the terminal.
The final STORE command is the one that inserts the data into HBase. Let's analyze
how it is structured:
INTO 'hbase://sensors' : This tells Pig to connect to the sensors
HBase table.
org.apache.pig.backend.hadoop.hbase.HBaseStorage : This is the
Pig class that will be used to write in HBase. Pig has adapters for multiple
data stores.
• The first field in the tuple, sensor_id , will be used as a row key.
 
Search WWH ::




Custom Search