Database Reference
In-Depth Information
• Using MapReduce to insert data in parallel (this approach also uses the Java
API), as shown in the following diagram:
mapreduce
mapreduce
mapreduce
put()
put()
put()
HBase
RegionServer
HBase
RegionServer
HBase
RegionServer
• Using MapReduce to generate HBase store files in parallel in bulk and then
import them into HBase directly. (This approach does not require the use of
the API; it does not require code and is very efficient.)
mapreduce
mapreduce
mapreduce
1- create()
1- create()
1- create()
HFile on
HDFS
HFile on
HDFS
HFile on
HDFS
2- import()
2- import()
2- import()
HBase
RegionServer
HBase
RegionServer
HBase
RegionServer
On comparing the three methods speed wise, we have the following
order:
Java client < MapReduce insert < HBase file import
The Java client and MapReduce use HBase APIs to insert data.
MapReduce runs on multiple machines and can exploit parallelism.
However, both of these methods go through the write path in HBase.
Importing HBase files directly, however, skips the usual write path.
HBase files already have data in the correct format that HBase
understands. That's why importing them is much faster than using
MapReduce and the Java client.
 
Search WWH ::




Custom Search