Database Reference
In-Depth Information
By default, the ports for the Hadoop daemons are:
The Hadoop daemon
Port
Namenode
50070
Secondary namenode 50090
Jobtracker
50030
Datanode
50075
Tasktracker
50060
The preceding mentioned ports can be configured in the hdfs-site.xml and
mapred-site.xml files.
YARN is a general-purpose, distributed, application management framework for process-
ing data in Hadoop clusters.
YARN was built to solve the following two important problems:
• Support for large clusters (4000 nodes or more)
• The ability to run other applications apart from MapReduce to make use of data
already stored in HDFS, for example, MPI and Apache Giraph
In Hadoop Version 1.x, MapReduce can be divided into the following two parts:
The MapReduce user framework : This consists of the user's interaction with
MapReduce such as the application programming interface for MapReduce
The MapReduce system : This consists of system level tasks such as monitoring,
scheduling, and restarting of failed tasks
The jobtracker daemon had these two parts tightly coupled within itself and was respons-
ible for managing the tasks and all its related operations by interacting with the tasktrack-
er daemon. This responsibility turned out to be overwhelming for the jobtracker daemon
when the nodes in the cluster started increasing and reached the 4000 node mark. This
was a scalability issue that needed to be fixed. Also, the investment in Hadoop could not
be justified as MapReduce was the only way to process data on HDFS. Other tools were
unable to process this data. YARN was built to address these issues and is part of Hadoop
Search WWH ::




Custom Search