Database Reference
In-Depth Information
° Set the block cache for scan object as
false
.
° Define the input table using the
TableMapReduceUtil.
initTableMapperJob
method. This method takes the source
table name, scan object, the
Mapper
class name,
MapOutputKey
,
MapOutputValue
, and the
Job
object.
° Define the input table using
TableMapReduceUtil.
initTableReducerJob
. This method takes the target table name,
the
Reducer
class name and the
Job
object.
HBase integrates with the MapReduce framework in three different ways, where
HBase can be used as a data source to feeding the job, as a data sink to store the
job results, or as a dual role of both data source and sink.
Learning the MapReduce programming is out of scope of this topic,
and the following three sections can only be useful for experienced
MapReduce programmers.
HBase as a data source
HBase as a data source can use the
TableInputFormat
class that sets up a table as
an input to the MapReduce process. Here, the
Mapper
class extends the
TableMapper
class that sets the output key and value types as follows:
static class HBaseTestMapper extends TableMapper<Text, IntWritable>
Then, in the job execution method,
main()
, create and conigure a
Scan
instance
and set up the table mapper phase using the supplied utility as:
Scan scan = new Scan();
scan.setCaching(250);
scan.setCacheBlocks(false);
Job job = new Job(conf, "Read data from " + table);
job.setJarByClass(HBaseMRTest.class);
TableMapReduceUtil.initTableMapperJob(table, scan,
HBaseSourceTestMapper.class, Text.class, IntWritable.class, job);
The code shows how to use the
TableMapReduceUtil
class with its static methods
to quickly conigure a job with all the required classes.