Database Reference
In-Depth Information
For testing purposes, an additional library ( .jar iles) required by the
job that is not available with the Hadoop codebase should be installed
locally on the task tracker machines. Copy the JAR iles on all the nodes
at a common location.
Add the JAR iles with the full path into the hadoop-env.sh
coniguration ile, into the HADOOP_CLASSPATH variable:
#export HADOOP_CLASSPATH = "<additional_jars>:$HADOOP_
CLASSPATH"
Restart all task trackers for the changes to be effective. This method is
not at all recommended for production environments.
While implementing Mapper , Reducer , and main driver class, these guidelines
should be followed:
• The Mapper class:
° The Mapper class should extend the TableMapper class
° The map method of the Mapper class takes the rowkey of the Hbase
table as an input key
° The define input key is the ImmutableBytesWritable object
° Another parameter, the org.apache.hadoop.hbase.client.Result
object contains the input values as column/column-families from the
HBase table
• The Reducer class
° The Mapper class should extend the TableReducer class
° The output key is defined as NULL
° The output value is defined as the org.apache.hadoop.hbase.
client.Put object.
• The Main class
° Configure the org.apache.hadoop.hbase.client.Scan object and
optionally define parameters such as start row, stop row, row filter,
columns, and the column-families for the scan object.
° Set the record caching size (the default is 1 , which is not preferred
for MapReduce) for scan object.
 
Search WWH ::




Custom Search