Database Reference
In-Depth Information
Table 9-7. Distributed cache API
Job API method
GenericOptionsParser
equivalent
Description
Add files to the distributed cache to be
copied to the task node.
addCacheFile(URI uri)
setCacheFiles(URI[] files)
-files
file1,file2,...
Add archives to the distributed cache to be
copied to the task node and unarchived
there.
addCacheArchive(URI uri)
setCacheArchives(URI[]
files)
-archives
archive1,archive2,...
Add files to the distributed cache to be ad-
ded to the MapReduce task's classpath. The
files are not unarchived, so this is a useful
way to add JAR files to the classpath.
addFileToClassPath(Path
file)
-libjars
jar1,jar2,...
None
Add archives to the distributed cache to be
unarchived and added to the MapReduce
task's classpath. This can be useful when
you want to add a directory of files to the
classpath, since you can create an archive
containing the files. Alternatively, you could
create a JAR file and use ad-
dFileToClassPath() , which works
equally well.
addArchiveToClassPath(Path
archive)
NOTE
The URIs referenced in the add or set methods must be files in a shared filesystem that exist when the
job is run. On the other hand, the filenames specified as a GenericOptionsParser option (e.g., -
files ) may refer to local files, in which case they get copied to the default shared filesystem (normally
HDFS) on your behalf.
This is the key difference between using the Java API directly and using GenericOptionsParser :
the Java API does not copy the file specified in the add or set method to the shared filesystem, whereas
the GenericOptionsParser does.
Retrieving distributed cache files from the task works in the same way as before: you ac-
cess the localized file directly by name, as we did in Example 9-13 . This works because
MapReduce will always create a symbolic link from the task's working directory to every
file or archive added to the distributed cache. [ 67 ] Archives are unarchived so you can ac-
cess the files in them using the nested path.
Search WWH ::




Custom Search