Database Reference
In-Depth Information
Chapter 4. YARN
Apache YARN (Yet Another Resource Negotiator) is Hadoop's cluster resource manage-
ment system. YARN was introduced in Hadoop 2 to improve the MapReduce implementa-
tion, but it is general enough to support other distributed computing paradigms as well.
YARN provides APIs for requesting and working with cluster resources, but these APIs are
not typically used directly by user code. Instead, users write to higher-level APIs provided
by distributed computing frameworks, which themselves are built on YARN and hide the
resource management details from the user. The situation is illustrated in Figure 4-1 , which
shows some distributed computing frameworks (MapReduce, Spark, and so on) running as
YARN applications on the cluster compute layer (YARN) and the cluster storage layer
(HDFS and HBase).
Figure 4-1. YARN applications
There is also a layer of applications that build on the frameworks shown in Figure 4-1 . Pig,
Hive, and Crunch are all examples of processing frameworks that run on MapReduce,
Spark, or Tez (or on all three), and don't interact with YARN directly.
This chapter walks through the features in YARN and provides a basis for understanding
later chapters in Part IV that cover Hadoop's distributed processing frameworks.
Search WWH ::




Custom Search