Database Reference
In-Depth Information
HBase as a data sink
HBase as a data sink can also use the TableOutputFormat class that sets up a table
as an output to the MapReduce process as:
Job job = new Job(conf, "Writing data to the " + table);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, table);
The preceding lines also uses an implicit write buffer set up by the
TableOutputFormat class. The call to context.write() issues an internal table.
put() interface with the given instance of Put . The TableOutputFormat class also
takes care of calling flushCommits() when the job is complete.
In a typical MapReduce usage with HBase, a reducer is not usually needed as data
is already sorted and has unique keys to be stored in the HBase tables. If a reducer
is required for certain use cases, it should extend the TableReducer class that again
sets the input key and value types as:
static class HBaseSourceTestReduce extends TableReducer<.,.>
Also, set it in the job coniguration as:
TableMapReduceUtil.initTableReducerJob("customers", HBaseTestReduce.
class, job);
Here, the writes go to the region that is responsible for the rowkey that is being
written by the reduce task.
HBase as a data source and sink
This use case is the mix of both, that is, HBase as a data source as well as a data sink.
Let's look at the complete code example that uses HBase as a source as well as a sink.
This example reads the records from the Customer table for column-families cf1 and
copies it to another table, CustomerTableCopy :
package com.ch4;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
 
Search WWH ::




Custom Search