Databases Reference
In-Depth Information
while (values.hasNext()) {
V2? v = values.next();
...
}
The reduce() method is also given an OutputCollector to gather its key/value out-
put, which is of type K3 / V3 . Somewhere in the reduce() method you'll call
output.collect((K3) k, (V3) v);
In addition to having consistent K2 and V2 types across Mapper and Reducer , you'll
also need to ensure that the key and value types used in Mapper and Reducer are con-
sistent with the input format, output key class, and output value class set in the driver.
The use of KeyValueTextInputFormat
means that K1 and V1 must both be type Text .
The driver must call setOutputKeyClass()
and setOutputValueClass()
with the
classes of K2 and V2 , respectively.
Finally, all the key and value types
must be subtypes of Writable , which ensures a
serialization interface for Hadoop to send the data around in a distributed cluster. In
fact, the key types implement WritableComparable
, a subinterface of Writable . The
key types need to additionally support the compareTo() method, as keys are used for
sorting in various places in the MapReduce framework.
4.3
Counting things
Much of what the layperson thinks of as statistics is counting, and many basic Hadoop
jobs involve counting. We've already seen the word count
example in chapter 1. For
the patent citation data, we may want the number of citations a patent has received.
This too is counting. The desired output would look like this:
1 2
10000 1
100000 1
1000006 1
1000007 1
1000011 1
1000017 1
1000026 1
1000033 2
1000043 1
1000044 2
1000045 1
1000046 2
1000049 1
1000051 1
1000054 1
1000065 1
1000067 3
In each record, a patent number is associated with the number of citations it has re-
ceived. We can write a MapReduce program for this task. Like we said earlier, you
hardly ever write a MapReduce program from scratch. You have an existing MapReduce
 
Search WWH ::




Custom Search