Database Reference
In-Depth Information
By default, Import will load data directly into HBase. To instead generate HFiles of
data to prepare for bulk data load, pass the following option:
-Dimport.bulk.output=/path/for/output
To apply a generic org.apache.hadoop.hbase.filter.Filter ilter to the input,
use the following command:
-Dimport.filter.class=<name of filter class>
-Dimport.filter.args=<comma separated list of args for filter
The ilter will be applied before renaming keys via the HBASE_
IMPORTER_RENAME_CFS property. Further, ilters will only
use the Filter#filterRowKey(byte[] buffer, int
offset, int length) method to identify whether the
current row needs to be ignored completely for processing
and the Filter#filterKeyValue(KeyValue) method to
determine whether the KeyValue should be added; Filter.
ReturnCode#INCLUDE and #INCLUDE_AND_NEXT_COL will
be considered as including the KeyValue.
For performance, consider the following options:
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
Copy table
The CopyTable MapReduce job is used to scan through an HBase table and directly
write to another table. During this process, no intermediate lat ile is created. Using
this utility, Put is performed directly into the sink table, which can be on the same
cluster or on an entirely different cluster. Like the export job, we can also specify
the start and end timestamps to ensure ine-grained control over the data. The
CopyTable MapReduce job is invoked as follows:
$hbase org.apache.hadoop.hbase.mapreduce.CopyTable
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.
name=NEW] [--peer.adr=ADR] <tablename>
Options:
rs.class hbase.regionserver.class of the peer cluster
specify if different from current cluster
 
Search WWH ::




Custom Search