HDFS and MapReduce - Cloudera Administration

Database Reference

In-Depth Information

Getting acquainted with MapReduce

Now you have a solid knowledge base in HDFS, it is now time to dive into the processing

module of Hadoop known as MapReduce. Once we have the data in the cluster, we need a

programming model to perform advanced operations on it. This is done using Hadoop's

MapReduce.

The MapReduce programming model concept has been in existence for quite some time

now. This model was designed to process large volumes of data in parallel. Google imple-

mented a version of MapReduce in house to process their data stored on GFS. Later,

Google released a paper explaining their implementation. Hadoop's MapReduce imple-

mentation is based on this paper.

MapReduce in Hadoop is a Java-based distributed programming framework that leverages

the features of HDFS to execute high performance batch processing of the data stored in

HDFS.

The processing can be divided into major functions, which are:

• Map

• Reduce

Since the primary focus of this topic is on the administrative aspects of Hadoop, we will fo-

cus on the MapReduce architecture and how it works together with HDFS to process large

volumes of data.

Search WWH ::

Custom Search

Home