Databases Reference
In-Depth Information
void map(K1 key,
V1 value,
OutputCollector<K2,V2> output,
Reporter reporter
) throws IOException
The function generates a (possibly empty) list of (K2, V2) pairs for a given (K1, V1)
input pair. The OutputCollector receives the output of the mapping process, and
the Reporter
provides the option to record extra information about the mapper as
the task progresses.
Hadoop provides a few useful mapper implementations. You can see some of them
in the table 3.2.
Table 3.2 Some useful Mapper
implementations predefined by Hadoop
Class
Description
Implements Mapper<K,V,K,V> and maps inputs directly to outputs
IdentityMapper<K,V>
InverseMapper<K,V>
Implements Mapper<K,V,V,K> and reverses the key/value pair
RegexMapper<K>
Implements Mapper<K,Text,Text,LongWritable> and generates a
(match, 1) pair for every regular expression match
Implements Mapper<K,Text,Text,LongWritable> and generates a
(token, 1) pair when the input value is tokenized
TokenCountMapper<K>
As the MapReduce name implies, the major data flow operation after map is the re-
duce phase, shown in the bottom part of figure 3.1.
3.2.3
Reducer
As with any mapper implementation, a reducer
must first extend the MapReduce base
class to allow for configuration and cleanup. In addition, it must also implement the
Reducer interface which has the following single method:
void reduce(K2 key,
Iterator<V2> values,
OutputCollector<K3,V3> output,
Reporter reporter
) throws IOException
When the reducer task receives the output from the various mappers, it sorts the
incoming data on the key of the (key/value) pair and groups together all values of
the same key. The reduce() function
is then called, and it generates a (possibly
empty) list of (K3, V3) pairs by iterating over the values associated with a given key. The
OutputCollector receives the output of the reduce process and writes it to an output
file. The Reporter provides the option to record extra information about the reducer
as the task progresses.
Table 3.3 lists a couple of basic reducer implementations provided by Hadoop.
 
Search WWH ::




Custom Search