Performance Optimization - HBase Design Patterns - page 103

Database Reference

In-Depth Information

We covered the Java API earlier. Let's start with how to insert data using

MapReduce.

Importing data into HBase using

MapReduce

MapReduce is the distributed processing engine of Hadoop. Usually, programs

read/write data from HDFS. Luckily, HBase supports MapReduce. HBase can be

the source and the sink for MapReduce programs. A source means MapReduce

programs can read from HBase, and sink means results from MapReduce can be

sent to HBase.

The following diagram illustrates various sources and sinks for MapReduce:

Input

(HDFS)

Output

(HDFS)

MapReduce

(1)

Input

(HDFS)

Output

(Hbase)

(2)

MapReduce

Input

(Hbase)

Output

(Hbase)

(3)

MapReduce

The diagram we just saw can be summarized as follows:

Scenario

Source

Sink

Description

1

HDFS

HDFS

This is a typical MapReduce method that reads data

from HDFS and also sends the results to HDFS.

2

HDFS

HBase

This imports the data from HDFS into HBase. It's a

very common method that is used to import data

into HBase for the first time.

3

HBase

HBase

Data is read from HBase and written to it. It is most

likely that these will be two separate HBase clusters.

It's usually used for backups and mirroring.

Next Page

HBase Design Patterns

Search WWH ::

Custom Search

Home