HBase - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

<reverse_order_timestamp>

*/

public static byte [] makeObservationRowKey ( String stationId ,

long observationTime ) {

byte [] row = new byte [ STATION_ID_LENGTH + Bytes . SIZEOF_LONG ];

Bytes . putBytes ( row , 0 , Bytes . toBytes ( stationId ), 0 ,

STATION_ID_LENGTH );

long reverseOrderTimestamp = Long . MAX_VALUE - observationTime ;

Bytes . putLong ( row , STATION_ID_LENGTH , reverseOrderTimestamp );

return row ;

}

The conversion takes advantage of the fact that the station ID is a fixed-length ASCII

string. Like in the earlier example, we use HBase's Bytes class for converting between

byte arrays and common Java types. The Bytes.SIZEOF_LONG constant is used for

calculating the size of the timestamp portion of the row key byte array. The

putBytes() and putLong() methods are used to fill the station ID and timestamp

portions of the key at the relevant offsets in the byte array.

The job is configured in the run() method to use HBase's TableOutputFormat .

The table to write to must be specified by setting the TableOut-

putFormat.OUTPUT_TABLE property in the job configuration.

It's convenient to use TableOutputFormat since it manages the creation of an HT-

able instance for us, which otherwise we would do in the mapper's setup() method

(along with a call to close() in the cleanup() method). TableOutputFormat

also disables the HTable auto-flush feature, so that calls to put() are buffered for

greater efficiency.

The example code includes a class called HBaseTemperatureDirectImporter to

demonstrate how to use an HTable directly from a MapReduce program. We can run the

program with the following:

% hbase HBaseTemperatureImporter input/ncdc/all

Load distribution

Watch for the phenomenon where an import walks in lockstep through the table, with all

clients in concert pounding one of the table's regions (and thus, a single node), then mov-

ing on to the next, and so on, rather than evenly distributing the load over all regions. This

is usually brought on by some interaction between sorted input and how the splitter works.

Randomizing the ordering of your row keys prior to insertion may help. In our example,

Search WWH ::

Custom Search

Home