HBase - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Loading Data

There are a relatively small number of stations, so their static data is easily inserted using

any of the available interfaces. The example code includes a Java application for doing

this, which is run as follows:

% hbase HBaseStationImporter input/ncdc/metadata/

stations-fixed-width.txt

However, let's assume that there are billions of individual observations to be loaded. This

kind of import is normally an extremely complex and long-running database operation,

but MapReduce and HBase's distribution model allow us to make full use of the cluster.

We'll copy the raw input data onto HDFS, and then run a MapReduce job that can read

the input and write to HBase.

Example 20-3 shows an example MapReduce job that imports observations to HBase

from the same input files used in the previous chapters' examples.

Example 20-3. A MapReduce application to import temperature data from HDFS into an

HBase table

public class HBaseTemperatureImporter extends Configured implements

Tool {

static class HBaseTemperatureMapper < K > extends Mapper < LongWritable ,

Text , K , Put > {

private NcdcRecordParser parser = new NcdcRecordParser ();

@Override

public void map ( LongWritable key , Text value , Context context )

throws

IOException , InterruptedException {

parser . parse ( value . toString ());

if ( parser . isValidTemperature ()) {

byte [] rowKey =

RowKeyConverter . makeObservationRowKey ( parser . getStationId (),

parser . getObservationDate (). getTime ());

Put p = new Put ( rowKey );

p . add ( HBaseTemperatureQuery . DATA_COLUMNFAMILY ,

HBaseTemperatureQuery . AIRTEMP_QUALIFIER ,

Bytes . toBytes ( parser . getAirTemperature ()));

context . write ( null , p );

}

Search WWH ::

Custom Search

Home