Scalable Parallel Processing with MapReduce - Professional NoSQL - page 226

Databases Reference

In-Depth Information

MAPREDUCE WITH HBASE

Next, you upload the NYSE data set into an HBase instance. This time, use MapReduce itself to

parse the .csv fi les and populate the data into HBase. Such “chained” usage of MapReduce is

quite popular and serves well to parse large fi les. Once the data is uploaded to HBase you can use

MapReduce a second time to run a few aggregate queries. Two examples of MapReduce have

already been illustrated and this third one should reinforce the concept of MapReduce and

demonstrate its suitability for multiple situations.

To use MapReduce with HBase you can use Java as the programming language of choice. It's not the only

option though. You could write MapReduce jobs in Python, Ruby, or PHP and have HBase as the source

and/or sink for the job. In this example, I create four program elements that need to work together:

A mapper class that emits key/value pairs.

➤

A reducer class that takes the values emitted from mapper and manipulates it to create

aggregations. In the data upload example, the mapper only inserts the data into an

HBase table.

A driver class that puts the mapper class and the reducer class together.

➤

➤

A class that triggers the job in its main method.

You can also combine all these four elements into a single class. The mapper and reducer can

become static inner classes in that case. For this example, though, you create four separate classes,

one each for the four elements just mentioned.

➤

I assume Hadoop and HBase are already installed and confi gured. Please add the following .jar

fi les to your Java classpath to make the following example compile and run:

hadoop-0.20.2-ant.jar

➤

➤

hadoop-0.20.2-core.jar

hadoop-0.20.2-tools.jar

➤

hbase-0.20.6.jar

The hadoop jar fi les are available in the Hadoop distribution and the hbase jar fi le comes with HBase.

➤

The mapper is like so:

package com.treasuryofideas.hbasemr;

import java.io.BufferedReader;

import java.io.FileReader;

import java.io.IOException;

Available for

download on

Wrox.com

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.MapWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class NyseMarketDataMapper extends

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home