YARN - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Chapter 4. YARN

Apache YARN (Yet Another Resource Negotiator) is Hadoop's cluster resource manage-

ment system. YARN was introduced in Hadoop 2 to improve the MapReduce implementa-

tion, but it is general enough to support other distributed computing paradigms as well.

YARN provides APIs for requesting and working with cluster resources, but these APIs are

not typically used directly by user code. Instead, users write to higher-level APIs provided

by distributed computing frameworks, which themselves are built on YARN and hide the

resource management details from the user. The situation is illustrated in Figure 4-1 , which

shows some distributed computing frameworks (MapReduce, Spark, and so on) running as

YARN applications on the cluster compute layer (YARN) and the cluster storage layer

(HDFS and HBase).

Figure 4-1. YARN applications

There is also a layer of applications that build on the frameworks shown in Figure 4-1 . Pig,

Hive, and Crunch are all examples of processing frameworks that run on MapReduce,

Spark, or Tez (or on all three), and don't interact with YARN directly.

This chapter walks through the features in YARN and provides a basis for understanding

later chapters in Part IV that cover Hadoop's distributed processing frameworks.

Search WWH ::

Custom Search

Home