The HBase Architecture - HBase Essentials

Database Reference

In-Depth Information

HBase as a data sink

HBase as a data sink can also use the TableOutputFormat class that sets up a table

as an output to the MapReduce process as:

Job job = new Job(conf, "Writing data to the " + table);

job.setOutputFormatClass(TableOutputFormat.class);

job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, table);

The preceding lines also uses an implicit write buffer set up by the

TableOutputFormat class. The call to context.write() issues an internal table.

put() interface with the given instance of Put . The TableOutputFormat class also

takes care of calling flushCommits() when the job is complete.

In a typical MapReduce usage with HBase, a reducer is not usually needed as data

is already sorted and has unique keys to be stored in the HBase tables. If a reducer

is required for certain use cases, it should extend the TableReducer class that again

sets the input key and value types as:

static class HBaseSourceTestReduce extends TableReducer<.,.>

Also, set it in the job coniguration as:

TableMapReduceUtil.initTableReducerJob("customers", HBaseTestReduce.

class, job);

Here, the writes go to the region that is responsible for the rowkey that is being

written by the reduce task.

HBase as a data source and sink

This use case is the mix of both, that is, HBase as a data source as well as a data sink.

Let's look at the complete code example that uses HBase as a source as well as a sink.

This example reads the records from the Customer table for column-families cf1 and

copies it to another table, CustomerTableCopy :

package com.ch4;

import java.io.IOException;

import java.util.HashMap;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

Search WWH ::

Custom Search

Home