Database Reference
In-Depth Information
Loading Data
There are a relatively small number of stations, so their static data is easily inserted using
any of the available interfaces. The example code includes a Java application for doing
this, which is run as follows:
%
hbase HBaseStationImporter input/ncdc/metadata/
stations-fixed-width.txt
However, let's assume that there are billions of individual observations to be loaded. This
kind of import is normally an extremely complex and long-running database operation,
but MapReduce and HBase's distribution model allow us to make full use of the cluster.
We'll copy the raw input data onto HDFS, and then run a MapReduce job that can read
the input and write to HBase.
Example 20-3
shows an example MapReduce job that imports observations to HBase
from the same input files used in the previous chapters' examples.
Example 20-3. A MapReduce application to import temperature data from HDFS into an
HBase table
public class
HBaseTemperatureImporter
extends
Configured
implements
Tool
{
static class
HBaseTemperatureMapper
<
K
>
extends
Mapper
<
LongWritable
,
Text
,
K
,
Put
> {
private
NcdcRecordParser parser
=
new
NcdcRecordParser
();
@Override
public
void
map
(
LongWritable key
,
Text value
,
Context context
)
throws
IOException
,
InterruptedException
{
parser
.
parse
(
value
.
toString
());
if
(
parser
.
isValidTemperature
()) {
byte
[]
rowKey
=
RowKeyConverter
.
makeObservationRowKey
(
parser
.
getStationId
(),
parser
.
getObservationDate
().
getTime
());
Put p
=
new
Put
(
rowKey
);
p
.
add
(
HBaseTemperatureQuery
.
DATA_COLUMNFAMILY
,
HBaseTemperatureQuery
.
AIRTEMP_QUALIFIER
,
Bytes
.
toBytes
(
parser
.
getAirTemperature
()));
context
.
write
(
null
,
p
);
}
}
}