Database Reference
In-Depth Information
}
public static
void
printUsage
(
Tool tool
,
String extraArgsUsage
) {
System
.
err
.
printf
(
"Usage: %s [genericOptions] %s\n\n"
,
tool
.
getClass
().
getSimpleName
(),
extraArgsUsage
);
GenericOptionsParser
.
printGenericCommandUsage
(
System
.
err
);
}
are many other default job settings, the ones bolded are those most central to running a
job. Let's go through them in turn.
The default input format is
TextInputFormat
, which produces keys of type
LongWritable
(the offset of the beginning of the line in the file) and values of type
Text
(the line of text). This explains where the integers in the final output come from:
they are the line offsets.
The default mapper is just the
Mapper
class, which writes the input key and value un-
changed to the output:
public class
Mapper
<
KEYIN
,
VALUEIN
,
KEYOUT
,
VALUEOUT
> {
protected
void
map
(
KEYIN key
,
VALUEIN value
,
Context context
)
throws
IOException
,
InterruptedException
{
context
.
write
((
KEYOUT
)
key
, (
VALUEOUT
)
value
);
}
}
Mapper
is a generic type, which allows it to work with any key or value types. In this
case, the map input and output key is of type
LongWritable
, and the map input and
output value is of type
Text
.
The default partitioner is
HashPartitioner
, which hashes a record's key to determine
which partition the record belongs in. Each partition is processed by a reduce task, so the
number of partitions is equal to the number of reduce tasks for the job:
public class
HashPartitioner
<
K
,
V
>
extends
Partitioner
<
K
,
V
> {
public
int
getPartition
(
K key
,
V value
,
int
numReduceTasks
) {
return
(
key
.
hashCode
() &
Integer
.
MAX_VALUE
) %
numReduceTasks
;
}
}