Database Reference
In-Depth Information
<reverse_order_timestamp>
*/
public static
byte
[]
makeObservationRowKey
(
String stationId
,
long
observationTime
) {
byte
[]
row
=
new
byte
[
STATION_ID_LENGTH
+
Bytes
.
SIZEOF_LONG
];
Bytes
.
putBytes
(
row
,
0
,
Bytes
.
toBytes
(
stationId
),
0
,
STATION_ID_LENGTH
);
long
reverseOrderTimestamp
=
Long
.
MAX_VALUE
-
observationTime
;
Bytes
.
putLong
(
row
,
STATION_ID_LENGTH
,
reverseOrderTimestamp
);
return
row
;
}
}
The conversion takes advantage of the fact that the station ID is a fixed-length ASCII
string. Like in the earlier example, we use HBase's
Bytes
class for converting between
byte arrays and common Java types. The
Bytes.SIZEOF_LONG
constant is used for
calculating the size of the timestamp portion of the row key byte array. The
putBytes()
and
putLong()
methods are used to fill the station ID and timestamp
portions of the key at the relevant offsets in the byte array.
The job is configured in the
run()
method to use HBase's
TableOutputFormat
.
The table to write to must be specified by setting the
TableOut-
putFormat.OUTPUT_TABLE
property in the job configuration.
It's convenient to use
TableOutputFormat
since it manages the creation of an
HT-
able
instance for us, which otherwise we would do in the mapper's
setup()
method
(along with a call to
close()
in the
cleanup()
method).
TableOutputFormat
also disables the
HTable
auto-flush feature, so that calls to
put()
are buffered for
greater efficiency.
The example code includes a class called
HBaseTemperatureDirectImporter
to
demonstrate how to use an
HTable
directly from a MapReduce program. We can run the
program with the following:
%
hbase HBaseTemperatureImporter input/ncdc/all
Load distribution
Watch for the phenomenon where an import walks in lockstep through the table, with all
clients in concert pounding one of the table's regions (and thus, a single node), then mov-
ing on to the next, and so on, rather than evenly distributing the load over all regions. This
is usually brought on by some interaction between sorted input and how the splitter works.
Randomizing the ordering of your row keys prior to insertion may help. In our example,