Database Reference
In-Depth Information
Getting acquainted with MapReduce
Now you have a solid knowledge base in HDFS, it is now time to dive into the processing
module of Hadoop known as MapReduce. Once we have the data in the cluster, we need a
programming model to perform advanced operations on it. This is done using Hadoop's
MapReduce.
The MapReduce programming model concept has been in existence for quite some time
now. This model was designed to process large volumes of data in parallel. Google imple-
mented a version of MapReduce in house to process their data stored on GFS. Later,
Google released a paper explaining their implementation. Hadoop's MapReduce imple-
mentation is based on this paper.
MapReduce in Hadoop is a Java-based distributed programming framework that leverages
the features of HDFS to execute high performance batch processing of the data stored in
HDFS.
The processing can be divided into major functions, which are:
• Map
• Reduce
Since the primary focus of this topic is on the administrative aspects of Hadoop, we will fo-
cus on the MapReduce architecture and how it works together with HDFS to process large
volumes of data.
Search WWH ::




Custom Search