Database Reference
In-Depth Information
other in a standby state. If an active NameNode fails, the standby NameNode takes
over. When using the HDFS HA feature, a Secondary NameNode is unnecessary
[16].
Figure 10.2 illustrates a Hadoop cluster with ten machines and the storage of
one large file requiring three HDFS data blocks. Furthermore, this file is stored
using triple replication. The machines running the NameNode and the Secondary
NameNode are considered master nodes . Because the DataNodes take their
instructions from the master nodes, the machines running the DataNodes are
referred to as worker nodes .
Figure 10.2 A file stored in HDFS
Structuring a MapReduce Job in Hadoop
Hadoop provides the ability to run MapReduce jobs as described, at a high level,
in Section 10.1.2. This section offers specific details on how a MapReduce job is
run in Hadoop. A typical MapReduce program in Java consists of three classes: the
driver, the mapper, and the reducer.
The driver provides details such as input file locations, the provisions for adding
the input file to the map task, the names of the mapper and reducer Java classes,
and the location of the reduce task output. Various job configuration options
can also be specified in the driver. For example, the number of reducers can be
Search WWH ::




Custom Search