Database Reference
In-Depth Information
We have one column family and we call it
f
(short for family).
Now, we will store two columns, max temperature and min temperature, in this
column family.
Pig for MapReduce
Pig
allows you to write MapReduce programs at a very high level, and inserting data
into HBase is just as easy.
Here's a Pig script that reads the sensor data from HDFS and writes it in HBase:
-- ## hdfs-to-hbase.pig
data = LOAD 'hbase-import/' using PigStorage(',') as (sensor_
id:chararray, max:int, min:int);
-- describe data;
-- dump data;
Now, store the data in
hbase://sensors
using the following line of code:
org.apache.pig.backend.hadoop.hbase.HBaseStorage('f:max,f:min');
After creating the table, in the first command, we will load data from the
hbase-
import
directory in HDFS.
The schema for the data is defined as follows:
Sensor_id : chararray (string)
max : int
min : int
The
describe
and
dump
statements can be used to inspect the data; in Pig,
describe
will give you the structure of the data object you have, and
dump
will output all the
data to the terminal.
The final
STORE
command is the one that inserts the data into HBase. Let's analyze
how it is structured:
•
INTO 'hbase://sensors'
: This tells Pig to connect to the
sensors
HBase table.
•
org.apache.pig.backend.hadoop.hbase.HBaseStorage
: This is the
Pig class that will be used to write in HBase. Pig has adapters for multiple
data stores.
• The first field in the tuple,
sensor_id
, will be used as a row key.