Database Reference
In-Depth Information
Running on a Cluster
Now that we are happy with the program running on a small test dataset, we are ready to
try it on the full dataset on a Hadoop cluster. Chapter 10 covers how to set up a fully dis-
tributed cluster, although you can also work through this section on a pseudo-distributed
cluster.
Packaging a Job
The local job runner uses a single JVM to run a job, so as long as all the classes that your
job needs are on its classpath, then things will just work.
In a distributed setting, things are a little more complex. For a start, a job's classes must be
packaged into a job JAR file to send to the cluster. Hadoop will find the job JAR automatic-
ally by searching for the JAR on the driver's classpath that contains the class set in the
setJarByClass() method (on JobConf or Job ). Alternatively, if you want to set an
explicit JAR file by its file path, you can use the setJar() method. (The JAR file path
may be local or an HDFS file path.)
Creating a job JAR file is conveniently achieved using a build tool such as Ant or Maven.
Given the POM in Example 6-3 , the following Maven command will create a JAR file
called hadoop-examples.jar in the project directory containing all of the compiled classes:
% mvn package -DskipTests
If you have a single job per JAR, you can specify the main class to run in the JAR file's
manifest. If the main class is not in the manifest, it must be specified on the command line
(as we will see shortly when we run the job).
Any dependent JAR files can be packaged in a lib subdirectory in the job JAR file, al-
though there are other ways to include dependencies, discussed later. Similarly, resource
files can be packaged in a classes subdirectory. (This is analogous to a Java Web applica-
tion archive , or WAR, file, except in that case the JAR files go in a WEB-INF/lib subdirect-
ory and classes go in a WEB-INF/classes subdirectory in the WAR file.)
The client classpath
The user's client-side classpath set by hadoop jar <jar> is made up of:
▪ The job JAR file
▪ Any JAR files in the lib directory of the job JAR file, and the classes directory (if
present)
▪ The classpath defined by HADOOP_CLASSPATH , if set
Search WWH ::




Custom Search