Database Reference
In-Depth Information
The node manager also maintains a reference count for the number of tasks using each file
in the cache. Before the task has run, the file's reference count is incremented by 1; then,
after the task has run, the count is decreased by 1. Only when the file is not being used
(when the count reaches zero) is it eligible for deletion. Files are deleted to make room for
a new file when the node's cache exceeds a certain size — 10 GB by default — using a
least-recently used policy. The cache size may be changed by setting the configuration
property
yarn.nodemanager.localizer.cache.target-size-mb
.
Although this design doesn't guarantee that subsequent tasks from the same job running
on the same node will find the file they need in the cache, it is very likely that they will:
tasks from a job are usually scheduled to run at around the same time, so there isn't the
opportunity for enough other jobs to run to cause the original task's file to be deleted from
the cache.
The distributed cache API
Most applications don't need to use the distributed cache API, because they can use the
ericOptionsParser
is not being used, then the API in
Job
can be used to put ob-
public
void
addCacheFile
(
URI uri
)
public
void
addCacheArchive
(
URI uri
)
public
void
setCacheFiles
(
URI
[]
files
)
public
void
setCacheArchives
(
URI
[]
archives
)
public
void
addFileToClassPath
(
Path file
)
public
void
addArchiveToClassPath
(
Path archive
)
Recall that there are two types of objects that can be placed in the cache: files and
archives. Files are left intact on the task node, whereas archives are unarchived on the task
node. For each type of object, there are three methods: an
addCache
XXXX
()
method to
add the file or archive to the distributed cache, a
setCache
XXXX
s()
method to set the
entire list of files or archives to be added to the cache in a single call (replacing those set
in any previous calls), and an
add
XXXX
ToClassPath()
method to add the file or
archive to the MapReduce task's classpath.
Table 9-7
compares these API methods to the
GenericOptionsParser
options described in
Table 6-1
.