Database Reference
In-Depth Information
machine's storage to HDFS. In summary, the job of the DataNode is to
manage all the I/O (that is, read and write requests).
HDFS is also the point of integration for a new Microsoft technology called
Polybase, which you will learn more about in Chapter 10, “Data Warehouses
and Hadoop Integration.”
MapReduce
MapReduce is both an engine and a programming model. Users develop
MapReduce programs and submit them to the MapReduce engine for
processing.Theprogramscreatedbythedevelopersareknownas jobs .Each
job is a combination of Java ARchive (JAR) files and classes required to
execute the MapReduce program. These files are themselves collated into a
single JAR file known as a job file .
Each MapReduce job can be broken down into a few key components. The
first phase of the job is the map . The map breaks the input up into many
tiny pieces so that it can then process each piece independently and in
parallel.Oncecomplete,theresultsfromthisinitialprocesscanbecollected,
aggregated, and processed. This is the reduce part of the job.
The MapReduce engine is used to distribute the workload across the HDFS
cluster and is responsible for the execution of MapReduce jobs. The
MapReduceengineacceptsjobsviatheJobTracker.ThereisoneJobTracker
per Hadoop cluster (the impact of which we discuss shortly). The
JobTracker provides the scheduling and orchestration of the MapReduce
engine; it does not actually process data itself.
To execute a job, the JobTracker communicates with the HDFS NameNode
to determine the location of the data to be analyzed. Once the location
is known, the JobTracker then speaks to another component of the
MapReduce engine called the TaskTracker . There are actually many
TaskTracker nodes in the Hadoop cluster. Each node of the cluster has its
own TaskTracker. Clearly then, the MapReduce engine is another master/
slave architecture.
TaskTrackers provide the execution engine for the MapReduce engine by
spawning a separate process for every task request. Therefore, the
JobTracker must identify the appropriate TaskTrackers to use by assessing
which are available to accept task requests and, ideally, which trackers are
Search WWH ::




Custom Search