HBase Administration - HBase Essentials

Database Reference

In-Depth Information

The Export job takes the source table name and the output directory name as inputs.

The number of versions, ilters, and start and end timestamps can also be provided

with the export job to have ine-grained control. Here, the start and end timestamps

help in executing the incremental export from the tables. The data is written as

Hadoop SequenceFiles in the speciied output directory. The SequenceFiles

data is keyed from rowkey to persist Result instances:

$hbase org.apache.hadoop.hbase.mapreduce.Export

Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions>

[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

The -D properties will be applied to the conf used; for example:

-D mapred.output.compress=true

-D mapred.output.compression.codec=org.apache.hadoop.io.compress.

GzipCodec

-D mapred.output.compression.type=BLOCK

Additionally, the following SCAN properties can be speciied to control/limit what

is exported:

-D hbase.mapreduce.scan.column.family=<familyName>

-D hbase.mapreduce.include.deleted.rows=true

For performance, consider the following properties:

-Dhbase.client.scanner.caching=100

-Dmapred.map.tasks.speculative.execution=false

-Dmapred.reduce.tasks.speculative.execution=false

For tables with very wide rows, consider setting the batch size as follows:

-Dhbase.export.scanner.batch=10

The Import job reads the records from the source sequential ile by creating Put

instances from the persisted Result instances. It then uses the HTable API to write

these puts to the target table. The Import option does not provide iltering of the

data while inserting into tables, and for any additional data manipulation, custom

implementation needs to be provided by extending the Import class.

$hbase org.apache.hadoop.hbase.mapreduce.Import

Usage: Import [options] <tablename> <inputdir>

Search WWH ::

Custom Search

Home