Writing basic MapReduce programs - Hadoop in Action

Databases Reference

In-Depth Information

while (values.hasNext()) {

V2? v = values.next();

...

}

The reduce() method is also given an OutputCollector to gather its key/value out-

put, which is of type K3 / V3 . Somewhere in the reduce() method you'll call

output.collect((K3) k, (V3) v);

In addition to having consistent K2 and V2 types across Mapper and Reducer , you'll

also need to ensure that the key and value types used in Mapper and Reducer are con-

sistent with the input format, output key class, and output value class set in the driver.

The use of KeyValueTextInputFormat

means that K1 and V1 must both be type Text .

The driver must call setOutputKeyClass()

and setOutputValueClass()

with the

classes of K2 and V2 , respectively.

Finally, all the key and value types

must be subtypes of Writable , which ensures a

serialization interface for Hadoop to send the data around in a distributed cluster. In

fact, the key types implement WritableComparable

, a subinterface of Writable . The

key types need to additionally support the compareTo() method, as keys are used for

sorting in various places in the MapReduce framework.

4.3

Counting things

Much of what the layperson thinks of as statistics is counting, and many basic Hadoop

jobs involve counting. We've already seen the word count

example in chapter 1. For

the patent citation data, we may want the number of citations a patent has received.

This too is counting. The desired output would look like this:

1 2

10000 1

100000 1

1000006 1

1000007 1

1000011 1

1000017 1

1000026 1

1000033 2

1000043 1

1000044 2

1000045 1

1000046 2

1000049 1

1000051 1

1000054 1

1000065 1

1000067 3

In each record, a patent number is associated with the number of citations it has re-

ceived. We can write a MapReduce program for this task. Like we said earlier, you

hardly ever write a MapReduce program from scratch. You have an existing MapReduce

Hadoop in Action

Search WWH ::

Custom Search

Home