Database Reference
In-Depth Information
order by setting the HADOOP_USER_CLASSPATH_FIRST environment variable to
true . For the task classpath, you can set mapre-
duce.job.user.classpath.first to true . Note that by setting these options
you change the class loading for Hadoop framework dependencies (but only in your job),
which could potentially cause the job submission or task to fail, so use these options with
caution.
Launching a Job
To launch the job, we need to run the driver, specifying the cluster that we want to run the
job on with the -conf option (we equally could have used the -fs and -jt options):
% unset HADOOP_CLASSPATH
% hadoop jar hadoop-examples.jar v2.MaxTemperatureDriver \
-conf conf/hadoop-cluster.xml input/ncdc/all max-temp
WARNING
We unset the HADOOP_CLASSPATH environment variable because we don't have any third-party de-
pendencies for this job. If it were left set to target/classes/ (from earlier in the chapter), Hadoop
wouldn't be able to find the job JAR; it would load the MaxTemperatureDriver class from target/
classes rather than the JAR, and the job would fail.
The waitForCompletion() method on Job launches the job and polls for progress,
writing a line summarizing the map and reduce's progress whenever either changes.
Here's the output (some lines have been removed for clarity):
14/09/12 06:38:11 INFO input.FileInputFormat: Total input paths to
process : 101
14/09/12 06:38:11 INFO impl.YarnClientImpl: Submitted application
application_1410450250506_0003
14/09/12 06:38:12 INFO mapreduce.Job: Running job:
job_1410450250506_0003
14/09/12 06:38:26 INFO mapreduce.Job: map 0% reduce 0%
...
14/09/12 06:45:24 INFO mapreduce.Job: map 100% reduce 100%
14/09/12 06:45:24 INFO mapreduce.Job: Job job_1410450250506_0003
completed
successfully
14/09/12 06:45:24 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=93995
FILE: Number of bytes written=10273563
FILE: Number of read operations=0
Search WWH ::




Custom Search