Databases Reference
In-Depth Information
The primary components and their focus areas are:
1. ResourceManager (RM), which has two main components:
Scheduler:
- The Scheduler is responsible for allocating resources to the various running applications
and manages the constraints of capacities, availability, and resource queues.
- The Scheduler is responsible for purely schedule management and will be working on
scheduling based on resource containers, which specify memory, disk, and CPU.
- The Scheduler will not assume restarting of failed tasks either due to application failure or
hardware failures.
Application Manager:
- Responsible for accepting job submissions.
- Negotiates the first container for executing the application-specific AM.
- Provides the service for restarting the AM container on failure.
- The Application Manager has three subcomponents:
a. Scheduler Negotiator—component responsible for negotiating the resources for the AM
with the Scheduler.
b. AMContainer Manager—component responsible for starting and stopping the container
of the AM by talking to the appropriate NodeManager.
c. AM Monitor—component responsible for managing the aliveness of the AM and
responsible for restarting the AM if necessary.
The ResourceManager stores snapshots of its state in the Zookeeper. In case of failure, a very
transparent restart is feasible and ensures availability.
2. NodeManager:
The NodeManager is a per-machine agent and is responsible for launching containers for
applications once the Scheduler allocates them to the application.
Container resource monitoring for ensuring that the allocated containers do not exceed their
allocated resource slices on the machine.
Setting up the environment of the container for the task execution including binaries, libraries,
and jars.
Manages local storage on the node. Applications can continue to use the local storage even
when they do not have an active allocation on the node, thus providing scalability and
availability.
3. ApplicationMaster (AM):
Per application.
Negotiates resources with the RM.
Manages application scheduling and task execution with NodeManagers.
Recovers the application on its own failure. Will either recover the application from the saved
persistent state or just run the application from the very beginning, depending on recovery
success.
YARN scalability
The resource model for YARN v1 or MapReduce v2 is memory-driven. Every node in the system is
modeled to be consisting of multiple containers of minimum size of memory. The ApplicationMaster
can request multiples of the minimum memory size as needed.
Search WWH ::




Custom Search