The HBase Architecture - HBase Essentials

Database Reference

In-Depth Information

For testing purposes, an additional library ( .jar iles) required by the

job that is not available with the Hadoop codebase should be installed

locally on the task tracker machines. Copy the JAR iles on all the nodes

at a common location.

Add the JAR iles with the full path into the hadoop-env.sh

coniguration ile, into the HADOOP_CLASSPATH variable:

#export HADOOP_CLASSPATH = "<additional_jars>:$HADOOP_

CLASSPATH"

Restart all task trackers for the changes to be effective. This method is

not at all recommended for production environments.

While implementing Mapper , Reducer , and main driver class, these guidelines

should be followed:

• The Mapper class:

° The Mapper class should extend the TableMapper class

° The map method of the Mapper class takes the rowkey of the Hbase

table as an input key

° The define input key is the ImmutableBytesWritable object

° Another parameter, the org.apache.hadoop.hbase.client.Result

object contains the input values as column/column-families from the

HBase table

• The Reducer class

° The Mapper class should extend the TableReducer class

° The output key is defined as NULL

° The output value is defined as the org.apache.hadoop.hbase.

client.Put object.

• The Main class

° Configure the org.apache.hadoop.hbase.client.Scan object and

optionally define parameters such as start row, stop row, row filter,

columns, and the column-families for the scan object.

° Set the record caching size (the default is 1 , which is not preferred

for MapReduce) for scan object.

Search WWH ::

Custom Search

Home