Database Reference
In-Depth Information
The node manager also maintains a reference count for the number of tasks using each file
in the cache. Before the task has run, the file's reference count is incremented by 1; then,
after the task has run, the count is decreased by 1. Only when the file is not being used
(when the count reaches zero) is it eligible for deletion. Files are deleted to make room for
a new file when the node's cache exceeds a certain size — 10 GB by default — using a
least-recently used policy. The cache size may be changed by setting the configuration
property yarn.nodemanager.localizer.cache.target-size-mb .
Although this design doesn't guarantee that subsequent tasks from the same job running
on the same node will find the file they need in the cache, it is very likely that they will:
tasks from a job are usually scheduled to run at around the same time, so there isn't the
opportunity for enough other jobs to run to cause the original task's file to be deleted from
the cache.
The distributed cache API
Most applications don't need to use the distributed cache API, because they can use the
cache via GenericOptionsParser , as we saw in Example 9-13 . However, if Gen-
ericOptionsParser is not being used, then the API in Job can be used to put ob-
jects into the distributed cache. [ 66 ] Here are the pertinent methods in Job :
public void addCacheFile ( URI uri )
public void addCacheArchive ( URI uri )
public void setCacheFiles ( URI [] files )
public void setCacheArchives ( URI [] archives )
public void addFileToClassPath ( Path file )
public void addArchiveToClassPath ( Path archive )
Recall that there are two types of objects that can be placed in the cache: files and
archives. Files are left intact on the task node, whereas archives are unarchived on the task
node. For each type of object, there are three methods: an addCache XXXX () method to
add the file or archive to the distributed cache, a setCache XXXX s() method to set the
entire list of files or archives to be added to the cache in a single call (replacing those set
in any previous calls), and an add XXXX ToClassPath() method to add the file or
archive to the MapReduce task's classpath. Table 9-7 compares these API methods to the
GenericOptionsParser options described in Table 6-1 .
Search WWH ::




Custom Search