Database Reference
In-Depth Information
For testing purposes, an additional library (
.jar
iles) required by the
job that is not available with the Hadoop codebase should be installed
locally on the task tracker machines. Copy the JAR iles on all the nodes
at a common location.
Add the JAR iles with the full path into the
hadoop-env.sh
coniguration ile, into the
HADOOP_CLASSPATH
variable:
#export HADOOP_CLASSPATH = "<additional_jars>:$HADOOP_
CLASSPATH"
Restart all task trackers for the changes to be effective. This method is
not at all recommended for production environments.
While implementing
Mapper
,
Reducer
, and main driver class, these guidelines
should be followed:
• The
Mapper
class:
° The
Mapper
class should extend the
TableMapper
class
° The
map
method of the
Mapper
class takes the rowkey of the Hbase
table as an input key
° The define input key is the
ImmutableBytesWritable
object
° Another parameter, the
org.apache.hadoop.hbase.client.Result
object contains the input values as column/column-families from the
HBase table
• The
Reducer
class
° The
Mapper
class should extend the
TableReducer
class
° The output key is defined as
NULL
° The output value is defined as the
org.apache.hadoop.hbase.
client.Put
object.
• The
Main
class
° Configure the
org.apache.hadoop.hbase.client.Scan
object and
optionally define parameters such as start row, stop row, row filter,
columns, and the column-families for the scan object.
° Set the record caching size (the default is
1
, which is not preferred
for MapReduce) for scan object.