Performance Optimization - HBase Design Patterns

Database Reference

In-Depth Information

• We are specifying the column names for the max and min fields ( f:max and

f:min , respectively). Note that we have to specify the column family ( f: )

to qualify the columns.

Before running this script, we need to create an HBase table called sensors . We

can do this from the HBase shell, as follows:

$ hbase shell

$ create 'sensors' , 'f'

$ quit

Then, run the Pig script as follows:

$ pig hdfs-to-hbase.pig

Now watch the console output. Pig will execute the script as a MapReduce job.

Even though we are only importing two small files here, we can insert a fairly

large amount of data by exploiting the parallelism of MapReduce.

At the end of the run, Pig will print out some statistics:

Input(s):

Successfully read 7 records (591 bytes) from: "hdfs://quickstart.

cloudera:8020/user/cloudera/hbase-import"

Output(s):

Successfully stored 7 records in: "hbase://sensors"

Looks good! We should have seven rows in our HBase sensors table. We can

inspect the table from the HBase shell with the following commands:

$ hbase shell

$ scan 'sensors'

This is how your output might look:

ROW COLUMN+CELL

sensor11 column=f:max, timestamp=1412373703149,

value=90

sensor11 column=f:min, timestamp=1412373703149,

value=70

sensor22 column=f:max, timestamp=1412373703177,

value=80

sensor22 column=f:min, timestamp=1412373703177,

value=70

Search WWH ::

Custom Search

Home