The HBase Architecture - HBase Essentials

Database Reference

In-Depth Information

° Set the block cache for scan object as false .

° Define the input table using the TableMapReduceUtil.

initTableMapperJob method. This method takes the source

table name, scan object, the Mapper class name, MapOutputKey ,

MapOutputValue , and the Job object.

° Define the input table using TableMapReduceUtil.

initTableReducerJob . This method takes the target table name,

the Reducer class name and the Job object.

HBase integrates with the MapReduce framework in three different ways, where

HBase can be used as a data source to feeding the job, as a data sink to store the

job results, or as a dual role of both data source and sink.

Learning the MapReduce programming is out of scope of this topic,

and the following three sections can only be useful for experienced

MapReduce programmers.

HBase as a data source

HBase as a data source can use the TableInputFormat class that sets up a table as

an input to the MapReduce process. Here, the Mapper class extends the TableMapper

class that sets the output key and value types as follows:

static class HBaseTestMapper extends TableMapper<Text, IntWritable>

Then, in the job execution method, main() , create and conigure a Scan instance

and set up the table mapper phase using the supplied utility as:

Scan scan = new Scan();

scan.setCaching(250);

scan.setCacheBlocks(false);

Job job = new Job(conf, "Read data from " + table);

job.setJarByClass(HBaseMRTest.class);

TableMapReduceUtil.initTableMapperJob(table, scan,

HBaseSourceTestMapper.class, Text.class, IntWritable.class, job);

The code shows how to use the TableMapReduceUtil class with its static methods

to quickly conigure a job with all the required classes.

Search WWH ::

Custom Search

Home