Advanced Analytics—Technology and Tools: MapReduce and Hadoop - Data Science and Big Data Analytics

Database Reference

In-Depth Information

other in a standby state. If an active NameNode fails, the standby NameNode takes

over. When using the HDFS HA feature, a Secondary NameNode is unnecessary

[16].

Figure 10.2 illustrates a Hadoop cluster with ten machines and the storage of

one large file requiring three HDFS data blocks. Furthermore, this file is stored

using triple replication. The machines running the NameNode and the Secondary

NameNode are considered master nodes . Because the DataNodes take their

instructions from the master nodes, the machines running the DataNodes are

referred to as worker nodes .

Figure 10.2 A file stored in HDFS

Structuring a MapReduce Job in Hadoop

Hadoop provides the ability to run MapReduce jobs as described, at a high level,

in Section 10.1.2. This section offers specific details on how a MapReduce job is

run in Hadoop. A typical MapReduce program in Java consists of three classes: the

driver, the mapper, and the reducer.

The driver provides details such as input file locations, the provisions for adding

the input file to the map task, the names of the mapper and reducer Java classes,

and the location of the reduce task output. Various job configuration options

can also be specified in the driver. For example, the number of reducers can be

Search WWH ::

Custom Search

Home