Databases Reference
In-Depth Information
MAPREDUCE WITH HBASE
Next, you upload the NYSE data set into an HBase instance. This time, use MapReduce itself to
parse the .csv fi les and populate the data into HBase. Such “chained” usage of MapReduce is
quite popular and serves well to parse large fi les. Once the data is uploaded to HBase you can use
MapReduce a second time to run a few aggregate queries. Two examples of MapReduce have
already been illustrated and this third one should reinforce the concept of MapReduce and
demonstrate its suitability for multiple situations.
To use MapReduce with HBase you can use Java as the programming language of choice. It's not the only
option though. You could write MapReduce jobs in Python, Ruby, or PHP and have HBase as the source
and/or sink for the job. In this example, I create four program elements that need to work together:
A mapper class that emits key/value pairs.
A reducer class that takes the values emitted from mapper and manipulates it to create
aggregations. In the data upload example, the mapper only inserts the data into an
HBase table.
A driver class that puts the mapper class and the reducer class together.
A class that triggers the job in its main method.
You can also combine all these four elements into a single class. The mapper and reducer can
become static inner classes in that case. For this example, though, you create four separate classes,
one each for the four elements just mentioned.
I assume Hadoop and HBase are already installed and confi gured. Please add the following .jar
fi les to your Java classpath to make the following example compile and run:
hadoop-0.20.2-ant.jar
hadoop-0.20.2-core.jar
hadoop-0.20.2-tools.jar
hbase-0.20.6.jar
The hadoop jar fi les are available in the Hadoop distribution and the hbase jar fi le comes with HBase.
The mapper is like so:
package com.treasuryofideas.hbasemr;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
Available for
download on
Wrox.com
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class NyseMarketDataMapper extends
Search WWH ::




Custom Search