Database Reference
In-Depth Information
We covered the Java API earlier. Let's start with how to insert data using
MapReduce.
Importing data into HBase using
MapReduce
MapReduce is the distributed processing engine of Hadoop. Usually, programs
read/write data from HDFS. Luckily, HBase supports MapReduce. HBase can be
the source and the sink for MapReduce programs. A source means MapReduce
programs can read from HBase, and sink means results from MapReduce can be
sent to HBase.
The following diagram illustrates various sources and sinks for MapReduce:
Input
(HDFS)
Output
(HDFS)
MapReduce
(1)
Input
(HDFS)
Output
(Hbase)
(2)
MapReduce
Input
(Hbase)
Output
(Hbase)
(3)
MapReduce
The diagram we just saw can be summarized as follows:
Scenario
Source
Sink
Description
1
HDFS
HDFS
This is a typical MapReduce method that reads data
from HDFS and also sends the results to HDFS.
2
HDFS
HBase
This imports the data from HDFS into HBase. It's a
very common method that is used to import data
into HBase for the first time.
3
HBase
HBase
Data is read from HBase and written to it. It is most
likely that these will be two separate HBase clusters.
It's usually used for backups and mirroring.
Search WWH ::




Custom Search