Database Reference
In-Depth Information
}
public static void printUsage ( Tool tool , String extraArgsUsage ) {
System . err . printf ( "Usage: %s [genericOptions] %s\n\n" ,
tool . getClass (). getSimpleName (), extraArgsUsage );
GenericOptionsParser . printGenericCommandUsage ( System . err );
}
Going back to MinimalMapReduceWithDefaults in Example 8-1 , although there
are many other default job settings, the ones bolded are those most central to running a
job. Let's go through them in turn.
The default input format is TextInputFormat , which produces keys of type
LongWritable (the offset of the beginning of the line in the file) and values of type
Text (the line of text). This explains where the integers in the final output come from:
they are the line offsets.
The default mapper is just the Mapper class, which writes the input key and value un-
changed to the output:
public class Mapper < KEYIN , VALUEIN , KEYOUT , VALUEOUT > {
protected void map ( KEYIN key , VALUEIN value ,
Context context ) throws IOException , InterruptedException {
context . write (( KEYOUT ) key , ( VALUEOUT ) value );
}
}
Mapper is a generic type, which allows it to work with any key or value types. In this
case, the map input and output key is of type LongWritable , and the map input and
output value is of type Text .
The default partitioner is HashPartitioner , which hashes a record's key to determine
which partition the record belongs in. Each partition is processed by a reduce task, so the
number of partitions is equal to the number of reduce tasks for the job:
public class HashPartitioner < K , V > extends Partitioner < K , V > {
public int getPartition ( K key , V value ,
int numReduceTasks ) {
return ( key . hashCode () & Integer . MAX_VALUE ) % numReduceTasks ;
}
}
Search WWH ::




Custom Search