Database Reference
In-Depth Information
The Export job takes the source table name and the output directory name as inputs.
The number of versions, ilters, and start and end timestamps can also be provided
with the export job to have ine-grained control. Here, the start and end timestamps
help in executing the incremental export from the tables. The data is written as
Hadoop SequenceFiles in the speciied output directory. The SequenceFiles
data is keyed from rowkey to persist Result instances:
$hbase org.apache.hadoop.hbase.mapreduce.Export
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions>
[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
The -D properties will be applied to the conf used; for example:
-D mapred.output.compress=true
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.
GzipCodec
-D mapred.output.compression.type=BLOCK
Additionally, the following SCAN properties can be speciied to control/limit what
is exported:
-D hbase.mapreduce.scan.column.family=<familyName>
-D hbase.mapreduce.include.deleted.rows=true
For performance, consider the following properties:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
For tables with very wide rows, consider setting the batch size as follows:
-Dhbase.export.scanner.batch=10
The Import job reads the records from the source sequential ile by creating Put
instances from the persisted Result instances. It then uses the HTable API to write
these puts to the target table. The Import option does not provide iltering of the
data while inserting into tables, and for any additional data manipulation, custom
implementation needs to be provided by extending the Import class.
$hbase org.apache.hadoop.hbase.mapreduce.Import
Usage: Import [options] <tablename> <inputdir>
 
Search WWH ::




Custom Search