Database Reference
In-Depth Information
The lower part of the table shows the total number of failed and killed task attempts for
the map or reduce tasks. Task attempts may be marked as killed if they are speculative ex-
ecution duplicates, if the node they are running on dies, or if they are killed by a user. See
Task Failure for background on task failure.
There also are a number of useful links in the navigation. For example, the “Configura-
tion” link is to the consolidated configuration file for the job, containing all the properties
and their values that were in effect during the job run. If you are unsure of what a particu-
lar property was set to, you can click through to inspect the file.
Retrieving the Results
Once the job is finished, there are various ways to retrieve the results. Each reducer pro-
duces one output file, so there are 30 part files named part-r-00000 to part-r-00029 in the
max-temp directory.
NOTE
As their names suggest, a good way to think of these “part” files is as parts of the max-temp “file.”
If the output is large (which it isn't in this case), it is important to have multiple parts so that more than
one reducer can work in parallel. Usually, if a file is in this partitioned form, it can still be used easily
enough — as the input to another MapReduce job, for example. In some cases, you can exploit the struc-
ture of multiple partitions to do a map-side join, for example (see Map-Side Joins ).
This job produces a very small amount of output, so it is convenient to copy it from HDFS
to our development machine. The -getmerge option to the hadoop fs command is
useful here, as it gets all the files in the directory specified in the source pattern and
merges them into a single file on the local filesystem:
% hadoop fs -getmerge max-temp max-temp-local
% sort max-temp-local | tail
1991 607
1992 605
1993 567
1994 568
1995 567
1996 561
1997 565
1998 568
1999 568
2000 558
Search WWH ::




Custom Search