Database Reference
In-Depth Information
Loading Data
There are a relatively small number of stations, so their static data is easily inserted using
any of the available interfaces. The example code includes a Java application for doing
this, which is run as follows:
% hbase HBaseStationImporter input/ncdc/metadata/
stations-fixed-width.txt
However, let's assume that there are billions of individual observations to be loaded. This
kind of import is normally an extremely complex and long-running database operation,
but MapReduce and HBase's distribution model allow us to make full use of the cluster.
We'll copy the raw input data onto HDFS, and then run a MapReduce job that can read
the input and write to HBase.
Example 20-3 shows an example MapReduce job that imports observations to HBase
from the same input files used in the previous chapters' examples.
Example 20-3. A MapReduce application to import temperature data from HDFS into an
HBase table
public class HBaseTemperatureImporter extends Configured implements
Tool {
static class HBaseTemperatureMapper < K > extends Mapper < LongWritable ,
Text , K , Put > {
private NcdcRecordParser parser = new NcdcRecordParser ();
@Override
public void map ( LongWritable key , Text value , Context context )
throws
IOException , InterruptedException {
parser . parse ( value . toString ());
if ( parser . isValidTemperature ()) {
byte [] rowKey =
RowKeyConverter . makeObservationRowKey ( parser . getStationId (),
parser . getObservationDate (). getTime ());
Put p = new Put ( rowKey );
p . add ( HBaseTemperatureQuery . DATA_COLUMNFAMILY ,
HBaseTemperatureQuery . AIRTEMP_QUALIFIER ,
Bytes . toBytes ( parser . getAirTemperature ()));
context . write ( null , p );
}
}
}
Search WWH ::




Custom Search