Database Reference
In-Depth Information
<reverse_order_timestamp>
*/
public static byte [] makeObservationRowKey ( String stationId ,
long observationTime ) {
byte [] row = new byte [ STATION_ID_LENGTH + Bytes . SIZEOF_LONG ];
Bytes . putBytes ( row , 0 , Bytes . toBytes ( stationId ), 0 ,
STATION_ID_LENGTH );
long reverseOrderTimestamp = Long . MAX_VALUE - observationTime ;
Bytes . putLong ( row , STATION_ID_LENGTH , reverseOrderTimestamp );
return row ;
}
}
The conversion takes advantage of the fact that the station ID is a fixed-length ASCII
string. Like in the earlier example, we use HBase's Bytes class for converting between
byte arrays and common Java types. The Bytes.SIZEOF_LONG constant is used for
calculating the size of the timestamp portion of the row key byte array. The
putBytes() and putLong() methods are used to fill the station ID and timestamp
portions of the key at the relevant offsets in the byte array.
The job is configured in the run() method to use HBase's TableOutputFormat .
The table to write to must be specified by setting the TableOut-
putFormat.OUTPUT_TABLE property in the job configuration.
It's convenient to use TableOutputFormat since it manages the creation of an HT-
able instance for us, which otherwise we would do in the mapper's setup() method
(along with a call to close() in the cleanup() method). TableOutputFormat
also disables the HTable auto-flush feature, so that calls to put() are buffered for
greater efficiency.
The example code includes a class called HBaseTemperatureDirectImporter to
demonstrate how to use an HTable directly from a MapReduce program. We can run the
program with the following:
% hbase HBaseTemperatureImporter input/ncdc/all
Load distribution
Watch for the phenomenon where an import walks in lockstep through the table, with all
clients in concert pounding one of the table's regions (and thus, a single node), then mov-
ing on to the next, and so on, rather than evenly distributing the load over all regions. This
is usually brought on by some interaction between sorted input and how the splitter works.
Randomizing the ordering of your row keys prior to insertion may help. In our example,
Search WWH ::




Custom Search