Database Reference
In-Depth Information
Task Execution
Once a task has been assigned resources for a container on a particular node by the re-
source manager's scheduler, the application master starts the container by contacting the
node manager (steps 9a and 9b). The task is executed by a Java application whose main
class is YarnChild . Before it can run the task, it localizes the resources that the task
needs, including the job configuration and JAR file, and any files from the distributed
cache (step 10; see Distributed Cache ) . Finally, it runs the map or reduce task (step 11).
The YarnChild runs in a dedicated JVM, so that any bugs in the user-defined map and
reduce functions (or even in YarnChild ) don't affect the node manager — by causing it
to crash or hang, for example.
Each task can perform setup and commit actions, which are run in the same JVM as the
task itself and are determined by the OutputCommitter for the job (see Output Com-
mitters ) . For file-based jobs, the commit action moves the task output from a temporary
location to its final location. The commit protocol ensures that when speculative execution
is enabled (see Speculative Execution ) , only one of the duplicate tasks is committed and
the other is aborted.
Streaming
Streaming runs special map and reduce tasks for the purpose of launching the user-sup-
plied executable and communicating with it ( Figure 7-2 ).
The Streaming task communicates with the process (which may be written in any lan-
guage) using standard input and output streams. During execution of the task, the Java
process passes input key-value pairs to the external process, which runs it through the
user-defined map or reduce function and passes the output key-value pairs back to the
Java process. From the node manager's point of view, it is as if the child process ran the
map or reduce code itself.
Search WWH ::




Custom Search