Database Reference
In-Depth Information
° Set the block cache for scan object as false .
° Define the input table using the TableMapReduceUtil.
initTableMapperJob method. This method takes the source
table name, scan object, the Mapper class name, MapOutputKey ,
MapOutputValue , and the Job object.
° Define the input table using TableMapReduceUtil.
initTableReducerJob . This method takes the target table name,
the Reducer class name and the Job object.
HBase integrates with the MapReduce framework in three different ways, where
HBase can be used as a data source to feeding the job, as a data sink to store the
job results, or as a dual role of both data source and sink.
Learning the MapReduce programming is out of scope of this topic,
and the following three sections can only be useful for experienced
MapReduce programmers.
HBase as a data source
HBase as a data source can use the TableInputFormat class that sets up a table as
an input to the MapReduce process. Here, the Mapper class extends the TableMapper
class that sets the output key and value types as follows:
static class HBaseTestMapper extends TableMapper<Text, IntWritable>
Then, in the job execution method, main() , create and conigure a Scan instance
and set up the table mapper phase using the supplied utility as:
Scan scan = new Scan();
scan.setCaching(250);
scan.setCacheBlocks(false);
Job job = new Job(conf, "Read data from " + table);
job.setJarByClass(HBaseMRTest.class);
TableMapReduceUtil.initTableMapperJob(table, scan,
HBaseSourceTestMapper.class, Text.class, IntWritable.class, job);
The code shows how to use the TableMapReduceUtil class with its static methods
to quickly conigure a job with all the required classes.
 
Search WWH ::




Custom Search