Database Reference
In-Depth Information
TaskTracker
Like the DataNode in case of HDFS, the TaskTracker is the actual execution unit of Ha-
doop. It creates a child JVM for Mapper and Reducer tasks. The maximum number of
tasks (Mapper and Reducer tasks) can be set independently. TaskTracker may reuse the
child JVMs to improve efficiency.
Reliability of data and processes in Hadoop
Hadoop is a very robust and reliable architecture. It is meant to be run on commodity
hardware and hence takes care of failure automatically. It detects the failure of a task and
retries the failed tasks. It is fault tolerant. A down DataNode is replicated (redundant) and
a system heals by itself, if a DataNode is unavailable.
Hadoop allows servers to join the cluster or leave it without any repercussion. Rack-aware
storage of data saves the cluster against disk failures, rack/machine power failure, and
even a complete rack going down.
The following figure shows the famous schema of the reliable Hadoop infrastructure us-
ing commodity hardware for slaves and heavy-duty servers (top of the rack) for the mas-
ters. Please note that these are physical servers, as they are in the data centers. Later, when
we will discuss using Cassandra as a data store for Hadoop, we will use a ring representa-
tion. Even in that case, the physical configuration may be the same as the one represented
in the following figure, but the logical configuration, as we have seen throughout this
topic, will be a ring-like structure to emphasize the token distribution.
Search WWH ::




Custom Search