Getting Started with Apache Hadoop - Cloudera Administration - page 14

Database Reference

In-Depth Information

By default, the ports for the Hadoop daemons are:

The Hadoop daemon

Port

Namenode

50070

Secondary namenode 50090

Jobtracker

50030

Datanode

50075

Tasktracker

50060

The preceding mentioned ports can be configured in the hdfs-site.xml and

mapred-site.xml files.

YARN is a general-purpose, distributed, application management framework for process-

ing data in Hadoop clusters.

YARN was built to solve the following two important problems:

• Support for large clusters (4000 nodes or more)

• The ability to run other applications apart from MapReduce to make use of data

already stored in HDFS, for example, MPI and Apache Giraph

In Hadoop Version 1.x, MapReduce can be divided into the following two parts:

• The MapReduce user framework : This consists of the user's interaction with

MapReduce such as the application programming interface for MapReduce

• The MapReduce system : This consists of system level tasks such as monitoring,

scheduling, and restarting of failed tasks

The jobtracker daemon had these two parts tightly coupled within itself and was respons-

ible for managing the tasks and all its related operations by interacting with the tasktrack-

er daemon. This responsibility turned out to be overwhelming for the jobtracker daemon

when the nodes in the cluster started increasing and reached the 4000 node mark. This

was a scalability issue that needed to be fixed. Also, the investment in Hadoop could not

be justified as MapReduce was the only way to process data on HDFS. Other tools were

unable to process this data. YARN was built to address these issues and is part of Hadoop

Next Page

Cloudera Administration

Search WWH ::

Custom Search

Home